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Chapter  1 


Motivations  and  Strategies 


1.1  Motivations 

This  thesis  consists  of  discussions  of  several  operational  programs 
which  together  embody  an  artifact  recognition  system.  According  to  the 
Ansr-isan  tterJLfcara  Pis  Senary  aL  &&  Sanliaii  s  >  an  artifact  is: 

"An  object  produced  or  shaped  by  human  workmanship;  especially,  a 
simple  tool,  weapon,  or  ornament  of  archaeological  or  historical 
interest ." 

We  use  the  word  artifact  to  mean  a  constructed  object.  As  such,  we 
design  it  to  take  a  specific  shape.  Our  visual  recognition  system  is 
therefore  designed  to  extract  and  manipulate  shape  related  models  of 
objects.  These  models  are  "discontinuity"-based.  We  encode  objects  as 
a  cloud  of  discontinuity  points  (each  carrying  property  lists  which 
allow  point  to  point  comparisons)  which  are  interconnected  into  graphs. 
The  links  between  discontinuities  represent  relationships  into  which  the 
connected  points  enter. 

Recognition  of  a  thing  or  a  relationship,  in  the  case  of  machine 
intelligence,  can  be  defined  as  "instantiating"  that  thing  or 
relationship.  By  this  we  mean  producing  a  machine  state  which 
represents  the  thing  or  relationship.  In  our  system,  this  state  comes 
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about  by  detecting  critical  primitive  components  (edges,  regions, 
contours,  etc.),  encoding  them  as  atomic  quantities  (building  primitive 
data  blocks  in  memory  with  feature  vectors  describing  specific  component 
variations),  and  forming  relationships  between  these  atoms  (building 
graph  structures  which  represent  objects  and  object  groupings). 

In  this  work  we  have  proposed  and  implemented  a  variety  of  specific 
algorithms  for  specific  recognition  and  detection  problems,  however  the 
primary  contribution  made  has  been  the  integration  of  many  different 
processing  steps  into  a  bottom-up,  model-based  vision  system.  We  began 
our  design  assuming  that  top-down  processing  was  desirable  for  actual 
object  cognition.  To  perform  a  model  directed  parsing,  we  decided  it 
would  be  necessary  bo  get  initial  correspondences  from  models  to  scenes 
via  some  bottom-up  process.  This  bottom-up  process,  as  it  became  more 
and  more  powerful,  eventually  became  the  entire  recognition  system. 
While  we  do  not  totally  debunk  top-down  systems,  we  simply  point  out 
that  carefully  constructed  bottom-up  processing,  with  allowances  for 
multiple  interpretations,  can  perform  recognition  (as  sensors  become 
better,  computers  become  faster,  and  algorithms  more  powerful). 
Whereas,  top-down  algorithms  always  require  bottom-up  starter  processes. 

1.2  Strategies 

Our  early  visual  processing  is  single  image  based.  We  have  found 
that  processing  on  the  actual  image  arrays  is  the  most  efficient  way  to 
perform  local  convolution  operations.  But,  because  any  operation 
performed  uniformly  over  an  entire  retinal  field  (in  our  case  nominally 
256x256)  is  very  slow  for  serial  computers  (it  takes  approximately  8 
seconds  for  our  DSC  KI-10  to  sweep  through  an  image  array  for  averaging 
operations),  we  wish  to  move  into  more  abstract  spaces  quickly  (into 
spaces  described  by  image  graphs). 

We  begin  by  performing  edge  detection  (Chapter  2).  We  have 
invented  two  new  edge  detectors  which  allow  detection  of  sharp  and 
diffused  intensity  transitions.  We  have  also  evaluated  these  two 
detector  schemes  along  with  some  other  techniques  described  in  the 
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literature.  Our  techniques  are  good  for  detecting  shadow  and  object 
boundaries  in  a  noisy  environment,  if  the  noise  statistics  are  known  and 
sufficiently  below  the  level  of  significant  intensity  changes.  These 
restrictions  apply  readily  for  vidicon  and  solid  state  visual  band 
mosaic  sensors. 

Following  edge  detection,  we  segment  regions  by  convolving 
homogeneity  operators  with  the  edge  detected  images.  We  use  the  absence 
of  edges  in  an  area  to  form  connected  regions.  In  effect,  we  fill  in 
broken  edge  chains  which  enclose  compact  areas,  and  generate  some 
classes  of  subjective  contours  (Chapter  3).  We  then  use  a  scan  line 
based  (non-following)  algorithm  to  find  vertex  points  and  build  contour 
strings.  These  strings  are  smoothed  and  written  as 
vertex-string-surface  graphs  (V-S-S).  The  contour  encoding  is  made 
simpler  by  the  presence  of  accurate  region  assignments  prior  to  boundary 
extraction. 

Color  images  can  be  processed  by  performing  three  times  as  much 
work  prior  to  region  and  vertex-string  processing  (by  edge  detecting  on 
red,  green,  and  blue  images  -  superimposing  the  results).  For  textured 
scenes  i  _iificantly  more  processing  is  required.  Rough  areas  are 
transformed  into  smoothed  areas  in  new  images.  These  images  are  re-edge 
detected.  The  new  edge  data  is  precipitated  into  the  old  edge  images 
from  the  original  intensity  image  (after  edges  in  rough  regions  have 
been  removed).  Therefore,  the  amount  of  additional  processing  for 
texture  data  is  proportional  to  the  number  of  individual  texture 
smoothing  processes  implemented  (for  a  good  texture  system  probably  15 
to  100  times  more  processing  would  be  required). 

Following  V-3-3  formation  from  single  images,  we  no  longer  operate 
on  the  sampled  image  space.  We  transform  the  V-S-S  graph  into  a  half 
chunk  graph  (H-C).  The  half  chunk  graph  encodes  scenes  as 
interconnected  elemental  curvature  elements  (the  half  chunks),  each 
having  a  property  list  contain  image  intensity  properties  and  spatial 
properties  (positions,  tangents,  and  tangent  changes).  For  the  domain 
of  artifacts,  these  curvature  elements  contain  the  critical  object 
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features  required  for  recognition,  and  allow  object  encodings  resistant 
to  changes  in  viewing  angle.  We  correlate  feature  groupings  between 
pairs  of  H-C  graphs  for  depth  and  motion  information  (Chapter  4).  The 
result  of  each  graph  matching  is  a  transform  which  takes  one  graph 
dimensionally  into  the  other,  and  a  new  fused  graph  which  is  made  by 
merging  as  many  compatible  features  as  possible  between  the  input 
graphs . 

Following  graph  matching  many  object  subgroupings  have  been 
disassociated  (parts  of  single  objects  are  connected  in  depth  and  nave 
common  motion  transforms,  however  parts  of  different  objects  tend  to  be 
disconnected  or  do  not  move  together).  To  further  disassociate  object 
groupings  a  "Waltz-like"  labeling  scheme  is  employed.  This  process  is 
based  on  vertex  and  region  labeling.  Labels  are  constrained  by  semantic 
types  and  measured  boundary  feature  properties  (Appendix  D). 

Our  object  recognition  program  operates  on  H-C  graphs  which  are 
assumed  to  be  single  objects  (or  descriptive  fragments  of  single 
objects).  The  advantage  of  the  H-C  graph  for  object  modeling  is  that  it 
is  "feature"  centered  rather  than  "object"  centered.  The  graph  is  3cale 
and  coordinate  system  invariant  (Chapter  5).  We  match  object  H-C  graphs 
by  extracting  histograms  of  subgraph  components  between  input  graphs  and 
graphs  stored  in  an  object  library.  These  histograms  are  then  compared. 
It  is  proposed  that  objects  be  modeled  as  the  histograms  alone.  It  is 
also  proposed  that  the  H-C  graphs  be  directly  generated  from  edge  data 
(without  the  intermediate  step  of  V-S-S  graph  formation). 
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Chapter  2 

Two  New  Edge  Detectors 


2.1.  Introduction 

We  have  done  extensive  experimentation  in  low-level,  bottom-up 
segmentation  and  have  found  edge  detection  based  techniques  to  be  very 
tractable  from  both  theoretical  and  practical  standpoints.  We  have  In 
the  process  of  this  work  designed  two  interesting  detection  methods,  one 
difference-based  and  the  other  adaptive  threshold  based.  These 
techniques  are  desirable  because  they  find  "thin"  edge  strings  (i.e. 
unlike  strict  gradient-based  approaches  where  thick  bands  of  edge  data 
are  generated),  they  may  be  tuned  to  give  good  signal/noise  ratios  for  a 
variety  of  edge  transition  sizes  and  image  noise  statistics,  edge 
position  may  be  computed  to  greater  precision  than  the  basic  pixel  size, 
and  implementation  is  efficient  in  hardware  or  software.  Our  techniques 
are  based  on  the  application  of  arrays  of  edge  detectors,  each  sensitive 
to  a  different  group  of  edge  types.  An  algorithm  for  selecting  the  most 
appropriate  detector  response  in  a  particular  region  is  used  to 
consolidate  the  results  for  all  detectors. 
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2.2.  Related  Work 

Moat  of  the  work  in  low-level  visual  segmentation  can  be  grouped 
into  one  of  five  main  categories.  These,  are  segmentation  by  feature 
histogram,  by  region-based  processing,  by  gradient-based  edge  detection, 

by  edge  detection  with  non-maxima  suppression,  and  by  edge  template 

1 

matching.  Our  work  has  elements  of  last  two  techniques. 

The  classic  work  in  the  area  of  visual  segmentation  by  histogram 
based  techniques  was  done  by  Ohlander  [45].  In  his  work,  one 
dimensional  histograms  of  various  color  features  were  partitioned  using 
a  heuristic  concept  of  histogram  peak  "goodness"  (a  combination  of  pea* 
shape  and  isolation).  Then  the  partitions  were  reprojected  into  image 
space  to  generated  the  required  object  partitions.  Similar  work  has 
been  done  by  Schacter,  et  al.  [56]  and  Hanson  [20],  using  two 
dimensional  histograms,  with  similar  results. 

These  histogram-based  techniques  allow  good  results  for  some  cases, 
however  they  ignore  some  fairly  important  problems.  First,  it  has  been 
our  experience  that  camera  systems  tend  to  have  slow,  uniform  intensity 
variations  over  the  retinal  area  which  are  not  caused  by  lighting  or 
object  reflectance.  In  color  imaging,  this  variation  does  not  track 
rell  from  color  spectrum  to  color  spectrum.  What  is  worse,  the 
amplitude  of  the  variations  may  change  with  overall  ambient  lighting. 
In  summary,  all  the  points  associated  with  a  given  surface  may  not  give 
rise  to  a  well  defined  peak  in  the  histogram  space.  Obviously, 
histogram  peak  selection  can  then  break  down. 

A  tacit  assumption  of  histogram-based  techniques  is  that  there  is  a 
one-to-one  relationship  between  histogram  peaks  and  object  surfaces. 
Under  this  assumption,  each  peak  may  be  thought  of  as  an  Independent, 
unimodal  feature  distribution  function,  and  peak  selection  is  simply  the 
identification  of  the  parameters  of  the  distribution.  The  basic  problem 
in  histogram  techniques  can  be  traced  back  to  this  assumption.  While  in 
many  cases  each  peak  is  predominantly  from  a  single  object  surface,  we 
nave  absolutely  no  guarantee  that  this  is  the  case.  When  it  is  not, 
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histogram  techniques  are  not  valid. 

Region-based  techniques  have  been  used  by  Brice  and  Fennema  [6], 
Tenenbaum  [60],  Feldman  and  Takimovsky  [14],  and  others.  The  basic 
approach  consists  of  selecting  a  region  seed  point  (possibly  any  point, 
or  alternatively  some  archetypical  point),  and  adding  adjacent 
unassigned  points  minimizing  some  difference  criterion.  Region-based 
techniques  are  theoretically  equivalent  to  edge  base  techniques  except 
to  the  extent  that  they  Incorporate  higher  level  information.  Without 
this  high  level  information  the  contours  generated  between  regions  will 
exactly  coincide  to  contours  of  maximum  difference  (the  same  as  the 
contours  generated  by  a  non-maxima  suppressing  edge  detection  scheme). 
Brice  and  Fennema  incorporated  a  figure  "goodness"  criterion  by  allowing 
region  boundaries  to  grow  at  a  constrained  rate.  Tenenbaum  uses  a 
semantic  criterion  in  addition  to  simple  intensity  information  to  order 
merging.  Feldman  and  Takimovsky  Incorporate  a  Baysian  decision 
technique  into  their  region-based  system. 

We  feel  that  the  initial  feature  extraction  phase  of  processing  is 
not  the  place  to  incorporate  high  level  knowledge.  If  this  is  done,  it 
becomes  difficult  to  evaluate  performance  and  detracts  from  overall 
low-level  system  generality.  The  system  starts  seeing  what  it  wants  to 
see,  rather  than  what  is  actually  there. 

We  have  expanded  edge  detection  based  attacks  on  the  segmentation 
problem  into  three  different  approaches  because  each  represents  a 
slightly  different  view  of  what  an  "edge"  is.  In  the  pure  gradient 
approach,  an  edge  is  a  high  gradient  point.  This  view  of  things  goes 
back  at  least  to  Roberts  [51],  and  has  persisted  in  work  by  O'Gorman  and 
Clowes  [46],  Shirai  [58],  and  others.  The  problem  with  this  idea  of  an 
edge  is  that  over  a  thick  band  at  surface-surface  interfaces  we  get  high 
gradient  points. 

Rosenfeld  [52][53]  and  Burr  [7]  consider  an  edge  point  to  be  where 
the  maximum  (minimum)  gradient  occurs.  This  maximum  point  suppresses 
all  non-maxima  in  some  area  of  influence  around  Itself.  Rosenfeld,  as 


3 


an  additional  refinement,  sweeps  tne  image  with  operators  of  ascending 
size  (ixi,  2x2,  4x4,  3x3,  etc.)  to  average  over  random  textures.  His 
multiple  operator  scheme  supresses  non-maxima  over  eacn  set  of  operators 
of  a  given  size.  Then  over  each  area,  the  largest  operator  giving  a 
strong  gradient  is  selected,  suppressing  smaller  operators  in  the  same 
area.  In  Rosenfeld's  paper  it  can  be  seen  that  in  textured  environments 
this  approach  has  the  desired  effect.  However,  in  the  domain  of  machine 
parts,  where  objects  themselves  display  fine  detail,  Rosenfeld's 
algorithm  for  multiple  edge  detection  does  not  work.  It  is  to  this 
problem  we  address  ourselves. 

An  alternative  definition  of  an  "edge"  is  any  intensity  pattern 
which  fits  an  "edge"  template.  The  discrete  difference  mask  used  by 
Roberts  could  oe  viewed  as  a  trivial  template.  Selection  of  a  maximum 
point  could  be  viewed  as  selecting  a  point  where  the  difference  template 
correlates  with  the  Image  maximally.  Several  operators  of  a  more 
complex  nature  come  about  by  viewing  edges  in  this  light. 

The  operators  of  Yakimovsky  assume  edges  are  interfaces  between 
sets  of  points,  each  set  being  described  by  a  normal  distribution.  The 
mathematics  for  distribution  parameter  comparison  is  used  to  form  a 
function  of  edge  strength  in  an  area. 
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We  have,  for  comparison  purposes,  programmed  an  operator  of  this 
sort  with  non-maxima  suppression,  using  the  neighborhood  shapes  proposed 
in  [66].  It  should  be  pointed  out  that  the  operator  of  Yakimovsky  is  in 
fact  a  superposition  of 
C30], 

s  < 
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this  points  up  some  problems.  The  mean  comparison  component 
behaves  well  when  the  operator  is  not  centered  on  an  edge  interface  (the 
mean  comparison  is  more  or  less  a  finite  difference,  therefore  an 
approximation  to  the  gradient).  However,  the  sigma  component  peaks  at 
interfaces  where  standard  deviation  changes  and  on  both  sides  of  ones 
where  the  mean  changes.  This  can  cause  false  edges  to  be  detected.  The 
statistical  assumption  of  two  independent  distributions  is  valid  only 
right  on  surface-surface  interfaces. 


The  operator  of  Hueckel  [22][23][24]  comes  about  by  expressing  a 
family  of  templates  in  terms  of  an  orthonormal  family  of  functions.  We 
have  programmed  a  version  of  the  original  operator,  but  have  been 
thoroughly  disappointed  with  the  results.  From  the  paper  of  Hero  [43] 
it  can  be  seen  that  the  Hueckel  type  approach  can  be  done  with  different 
basis  functions  quite  efficiently.  We  have  reason  to  believe  that  the 
original  basis  functions  of  Hueckel  are  less  than  satisfactory  and  the 
ones  proposed  by  Mero  are  more  advantageous.  Using  these  basis 
functions  it  was  shown  that  the  Hueckel  type  operator  is  more  or  less 
equivalent  to  a  gradient.  It  should  be  pointed  out  that  like  the 
gradient,  on  gradual  intensity  slopes  thick  edges  (more  than  a  single 
point  wide)  occur.  Also,  angular  resolution  for  this  type  of  detector 
turns  out  to  less  than  one  might  expect. 


The  point  to  be  drawn  from  the  previous  discussions  is  that  most  of 
the  techniques  proposed  do  not  differ  substantially  from  a  theoretical 
point  of  view.  Practically,  performance  is  more  a  function  of  operator 
geometry  and  image  noise  statistics  than  the  models  selected  for  edge 
strength  computations  (all  model  thus  far  discussed  eventually  reduce  to 
subtraction  of  neighborhoods).  Our  system  uses  a  variable  geometry  to 
average  over  image  noise. 

2.3.  Difference-Based  Multiple  Operator  Technique 


This  edge  detection  system  is  based  on  the  simple  step  edge 
detector.  Detection  of  steps,  that  is,  places  where  intensity 
distributions  each  having  different  means  meet,  can  be  accomplished 
optimally  by  simple  differencing.  The  basic  difference  is  then  scaled 
by  a  factor  involving  the  standard  deviations  of  the  two  distributions 
or  heuristically,  a  term  based  on  the  variation  around  the  mean  change 
(see  formula  2.2.2).  The  formula  may  be  slightly  modified  as  follows  if 
we  assume  equal  neighborhood  sizes  and  take  tf",  =  f  (the  standard 
deviation  for  the  overall  image). 
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As  noted  in  section  2.2  using  the  moments  6,  and  to  characterize 
texture  is  dubious  because  these  terms  respond  very  strongly  also  to 
mean  variations,  therefore  we  feel  it  reasonable  to  simply  do  the  above 
scaling.  If  we  assume  t  as  the  threshold  for  neighborhoods  consisting 
of  a  single  measurement  then  for  neighborhoods  consisting  of  n 
mesurements  we  expect  the  following: 


2.3.3.  T(» O  *  y  4  m  fl/T**  Rf 


2.3.3  is  the  result  we  would  expect  for  purely  Gaussian 
distributions,  however  our  work  indicates  that  the  square  root  law 
actually  generates  thresholds  that  are  too  low.  As  a  fix  we  can  choose: 

T(n’)  *  t  [  ♦  /*" ] 


2. 3. 


For  our  imaging  system  we  nave  found  thresholds  plotted  as  boxes  in 
Figure  2.1  are  quite  good  (icsO.6).  the  constant  <  can  be  considered  the 
degree  to  which  difference  due  to  "noise"  accumulates  linearly  rather 
than  normally. 
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Plot  of  T  (fidge  Detection  Threshold)  vs.  N  (Sample  Size) 

Figure  2.1 


As  is  the  case  for  any  difference-based  detector,  there  will 
typically  be  several  adjacent  points  wnere  edge  strength  is  greater  tnan 
T.  *fe  only  want  bands  of  width  one.  To  accomplish  this  we  observe  that 
what  goes  up  must  come  down,  or  the  value  of  X  in  2.3*1  alternates  sign 
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for  each  new  step.  A  peak  is  recognized  as  all  the  points  where  X  is 
greater  than  zero,  or  all  the  points  where  X  is  less  than  zero.  The 
center  of  the  peak  is  either  its  absolute  maximum  or  its  center  of  mass 
(generally  either  measure  is  relatively  noise  free  and  nearly  located  at 
the  same  position).  The  center  of  mass  method  is  better  in  that  it 
allows  positioning  more  finely  than  the  basic  pixel  size.  Both  methods 
yield  edges  of  width  one. 

The  use  of  relative  maxima  rather  than  absolute  maxima  thinning  has 
been  used  extensively  and  is  really  at  the  heart  Rosenfeld's  notion  of 
non-maxima  suppression.  The  problems  of  using  pure  relative  maxima 
techniques  is  clearly  shown  in  Figure  2.4.  In  Figure  2.2  we  show  an 
intensity  profile.  Figure  2.3  shows  peak  selection  using  either  the 
center  of  mass  method  (CM)  or  the  absolute  maximum  method  (MAX-MIN). 
Figure  2.4  shows  multiple  peaks  detected  where  only  one  should  be  using 
relative  maxima  detection. 


In  short,  the  single  detector  indicates  one  edge  per  peak  for  any 
peak  having  points  with  an  edge  strength  greater  that  T.  We  have  thus 
far  restricted  our  discussion  to  one  dimensional  step  detection.  To 
expand  to  two  dimensions,  the  one  dimensional  edge  detectors  are  swept 
along  each  horizontal  and  vertical  line,  generating  two  basic  edge 
directions  (more  orientations  may  be  used  in  principle,  however  two  is 
the  minimum  required). 

The  sweeping  operation  is  performed  for  several  sized  operators  at 
the  same  time,  the  larger  differences  being  incrementally  computed  from 
the  smaller  ones  (Figure  2.5).  Larger  sized  differences  allow  more 
discrimination,  for  diffused  broad  intensity  transitions,  such  as  those 
found  in  shadows  cast  by  non-point  sources  and  highlights.  The  fastest 
transition  detector  that  has  strength  greater  than  its  corresponding  T 
marks  the  edge  position.  All  detections  from  larger  detectors  are 
disabled  to  prevent  feature  smearing.  Many  low  contrast  contours  can  be 
discriminated  only  by  detectors  using  many  cells  averaged  (note  Figure 
2.1,  the  larger  the  operator  sample  the  lower  the  effective  threshold  T 
per  sample).  Also  the  marking  detector  index  number  becomes  the  edge 
diffusion  size  (proportional  to  the  edge  peak  width).  Figure  2.6  shows 
a  typical  intensity  profile.  Figure  2.7  shows  the  output  of  a  small 
detector.  Figure  2.8  shows  that  a  larger  detector  is  able  to  pull  more 
contours  out  of  the  background  noise  than  the  smaller  one  (Figure  2.7). 
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Operator  Neighborhoods 
Figure  2.5 


The  preference  towards  small  detectors  is  analogous  to  Marr's 
requirements  of  edge  Isolation,  in  the  generation  of  his  primal  sketches 
[ 32  ]  C  33  ]  [ 35  ]  (really  edge  detected  images).  We  wish  the  simplest 
explanation  of  intensity  phenomena.  To  accomplish  this  we  need  to 
assume  minimum  interaction  between  edge  detector  outputs,  i.e.  isolated 
edge  steps.  The  simplest  way  to  minimize  detector  overlaps  is  to  use 
the  smallest  detectors  that  give  non-noise  detections.  If  two  small, 
isolated  detectors  fire  (i.e.  two  successive  edges),  and  over  the  same 
region,  one  larger  detector  fires,  the  simplest  explanation  is  that  the 
large  detector  is  firing  due  to  both  small  edges.  Figure  2.9  shows  now 
an  operator  can  be  large  enough  to  smear  together  isolated  edges. 
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An  Operator  Large  Enough  To  Smear  Edges 
Difference,  oize*i2 
Figure  2.9 


We  may  adjust  the  noise  model  (method  of  computing  T  for  each 
detector)  and  the  range  of  detectors  (typically  sizes  1  through  5)  to 
fit  a  wide  variety  of  imaging  systems.  It  is  our  experience  that  this 
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is  a  shortcoming  of  most  other  detection  schemes.  Contrasting  this 
system  and  the  system  proposed  by  Rosenfeld  [52][533  several  major 
differences  are  apparent .  In  our  scheme  operator  sizes  are  chosen  in  a 
linear  sequence  rather  than  a  geometric  one.  The  smallest  operators 
have  proportionally  more  influence  than  the  larger  ones,  rather  than  the 
other  way  around  as  in  the  Rosenfeld  scheme.  In  our  system,  low 
amplitude,  random  texture  is  ignored  via  an  image  noise  model.  We  also 
characterize  the  size  of  the  transition  region  for  the  intensity  change. 

2.4.  Adaptive  Threshold  Multiple  Operator  Technique 

After  getting  good  results  from  the  previous  system  of  edge 
detection  we  wondered  if  we  could  get  better  results  after  some  form  of 
second  order  image  enhancement.  To  our  chagrin,  the  results  were  much 
worse.  This  was  because  the  enhancement  caused  ringing  at  strong 
intensity  steps.  This  ringing  was  picked  up  as  "peaks"  in  the 
difference  space  and  therefore  caused  multiple  edges  when  of  sufficient 
amplitude.  These  problems  caused  us  to  consider  a  non-differential 
approach  to  edge  detection  which  is  interesting  for  several  reasons. 
First,  the  implementation  is  extremely  fast  and  extremely  simple.  Peak 
selection  and  thinning  are  almost  trivial.  And  last,  the  detection 
scheme  has  very  nice  biological  analogs  (works  better  with  more 
enhancement,  displays  mach-band  responses). 

The  first  step  is  to  compute  from  an  input  image  a  "fast"  image  and 
a  "slow"  image.  The  slow  Image  is  formed  by  computing  the  average 
intensity  of  a  large  neighborhood  (in  our  case  20  by  20)  centered  at 
each  cell.  This  picture  is  basically  an  artificially  defocused  image. 
It  may  be  thought  of  as  a  "threshold"  picture  where  each  cell  is 
proportional  to  the  average  light  level  in  the  larger  neighborhood. 

The  fast  picture  is  an  edge  enhanced  picture.  We  do  the 
enhancement  in  the  X  and  I  directions  independently  using  the  function 
in  Figure  2.10  evaluated  at  every  cell  position  along  a  line  orthogonal 
to  the  direction  of  the  edges  being  extracted. 


Enhancement  Function 
Figure  2.10 

The  slow  picture  is  then  subtracted  cell-wise  from  tne  fast 
picture,  generating  a  new  picture  which  is  zero  in  homogeneous  regions 
and  at  edges,  non-zero  around  edges.  At  this  point  we  have  computed  a 
quantity  analogous  to  the  "lightness"  proposed  by  Horn  in  [21],  when  we 
consider  that  the  subtraction  is  actually  comparable  to  a  division 
because  of  our  imaging  system.  This  system  has  a  log  of  light-level 
response  (Figure  2.11  -  this  is  generally  true  for  vidicon  imagers). 
This  picture  is  then  thresholded  at  plus  or  minus  some  threshold  T.  All 
cells  with  value  7,  -T  <  V  <  T  are  set  to  CO.  Calls  with  value  V,  V  >  T 
are  set  to  01,  and  cells  with  value  7,  7  <  -T  are  3et  to  10.  In  this 
way  we  obtain  a  three  level  picture  (10,00,01). 


Figure  2.11.  Quantization  Level  vs.  Light  Level 
and  ln[Llght  Level] 
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Edge  extraction  along  a  line  now  becomes  a  simple  binary  pattern 
matching  process.  An  edge  may  now  be  expressed  as: 

2.4.1.  <01X00  00  ...  00  00X10> 

or 

<10X00  00  ...  00  00X01  > 

The  central  string  of  00  codes  may  have  length  zero  for  the  fastest 
transition  edges,  or  up  to  N  where  M  is  the  longest  acceptable 
transition.  The  leading  01  or  10  codes  may  be  repeated.  The  first 
pattern  corresponds  to  a  negative  going  edge,  the  second,  to  a  positive 
going  edge.  For  each  size  edge  template  (sizes  1  to  N),  we  sweep  the 
three  level  pictures  in  the  X  and  X  directions.  We  light  a  bit  over 
each  position  where  an  edge  template  matches  (there  are  N  bit  planes, 
one  for  each  size  template).  Figure  2.12  shows  an  intensity  profile, 
the  defocused  intensity  profile,  and  the  enhanced  intensity  profile. 
Figure  2.13  shows  the  subtracted  profile,  with  peak  areas  marked(+0- 
three  level  data) . 
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As  indicated  earlier,  peak  selection  and  thinning  in  the  edge  bit 
planes  is  trivial.  Any  cell  in  a  given  bit  plane  belonging  to  a  string 
of  edge  points  of  length  greater  than  two  should  be  ignored.  (There 
will  be  edges  from  edge  templates  with  fewer  00  codes  marking  the  step, 
having  a  run  length  of  two  or  one.  This  is  an  isolation  condition, 
analogous  to  small  edge  detectors  preempting  larger  ones  in  our 
difference-based  technique.)  If  a  string  of  two  is  encountered,  the 
position  of  the  actual  edge  is  between  the  two  marks.  If  a  singleton 
string  alone  is  present,  it  directly  marks  the  edge.  We  simply  record 
mark  position  for  edge  position  and  bit  plane  index  for  transition 
region  size.  A  hypothetical  hardware  implementation  for  one  line  of 
detectors  is  quite  illuminating  (Figure  2.14).  We  think  the  similarity 
in  geometry  between  this  structure  and  ones  observed  in  biological 
visual  systems  is  interesting. 


Circuit  Diagram  For  One  Line  Of  Detectors 
Figure  2.14 
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Mach  band  response  in  this  detector  scheme  comes  about  as  a  result 
of  the  second  order  enhancement  used  in  generating  the  "fast"  image. 
This  enhancement  causes  the  results  in  Figures  2.15-2.16,  when  the  ramp 
breakpoint  is  sufficiently  great  to  produce  peaks  of  greater  amplitude 
than  T  (threshold  marked  in  Figure  2.15). 
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Figure  2.17  shows  the  images  from  which  previously  displayed 
intensity  plots  were  derived.  White  lines  mark  the  various  slices. 


Figures  2. 2-2. 4 


Figures  2. 6- 2. 9 


To  test  the  relative  merits  of  our  two  edge  detection  schemes  and 
those  of  several  other  researchers,  we  have  prepared  several 
characteristic  test  images.  The  first  six  are  computer  generated 
intensity  ramps,  one  set  of  three  vertically  oriented  (Figures 
2.18-2.20),  and  the  other  set  of  three  diagonally  oriented  (Figures 
2.21-2.23).  The  first  in  each  triple  has  no  additive  noise.  The  second 
has  Gaussian  noise  of  3=0.5  (nominally  the  noise  induced  in  a 
quantization  system  -  the  noise  levels  encountered  in  our  real  images). 
The  last  in  each  set  have  Gaussian  noise  of  S=1.0.  These  pictures  test 
noise  immunity,  the  degree  to  which  edge  detectors  can  follow  diffusing 
steps,  and  to  a  limited  extent,  orientation  bias. 
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Figure  2. IS.  Vertical  Ramp,  S=0.0 


2.21.  Diagonal  Ramp,  S=0.0 
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2.22.  Diagonal  Ramp,  S=0.5 


2.23.  Diagonal  Ramp,  5=1.0 


Me  have  also  digitized  some  characteristic  images  taken  in  our 
laboratory  to  test  the  various  edge  detection  techniques  under  more 
practical  conditions.  These  images  are  252x238x6  bits  and  have 
typically  +1  or  -1  level  random  intensity  variation  (occasionally  *2  or 
-2).  Figure  2.24  tests  repetitive  edge  detections.  Figure  2.25  tests 
the  degree  to  which  orientation  and  operator  size  biases  affect  small 
bodies.  Figure  2.26  tests  interactions  between  edges  of  different 
orientations.  Figure  2.27  tests  operator  performance  in  a  scene  with  a 
wide  range  of  intensity  values  and  edge  transition  sizes  (highlights, 
shadows,  object-object,  object-ground  interfaces). 

For  the  tests  we  have  implemented,  in  addition  to  our  two  detection 
schemes,  a  Hueckel  operator  [23],  a  faster  Hueckel-like  operator  [43],  a 
Yakimovsky  operator  [66]  with  non-maximum  suppression,  a  Rosenfeld 
non-linear  operator  [ 52 ] [ 53 ] *  and  a  simple  gradient  detector  with 
non-maximum  suppression  [7].  The  rules  of  the  game  are  as  follows. 
Each  detector  is  tuned  using  the  S=0.5  artificial  pictures  so  that 
performance  is  acceptable.  If  not  acceptable,  at  least  as  good  as 
possible.  Then  the  operator  i3  applied  to  all  the  other  test  pictures 
using  the  3ame  confidences  and  thresholds. 

We  intend  the  reader  to  examine  the  various  edge  pictures  and  draw 
his  own  qualitative  opinions  on  the  relative  merits  of  the  various 
techniques  tested,  however  we  offer  some  discussion  of  our  results  and 
several  quantitative  measures  (these  for  the  constructed  images).  Table 
2.1  tabulates  correct  and  incorrect  edge  densities  for  each  operator. 
Table  2.2  tabulates  the  length  of  diffused  contour  followed  by  each 
operator  (1.0  if  contour  is  followed  across  the  entire  test  picture). 
Table  2.3  tabulates  positional  variation  for  edges  per  unit  length  along 
the  diffused  ramp.  Table  2.4  suggests  the  minimum  stripe  size  an 
operator  can  detect  without  smearing  (size  in  pixels). 
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Figure  2.24.  Lines 
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Operator  j  False  Detections  !  True  Detections 
Name  !  Per  Unit  Area**  !  Per  Real  Edge* 


Hueckel 

0.03 

0.34 

Psuedo-H*»* 

0.0005 

0.56 

Yakimovsky 

0.004 

0.73 

Burr 

0.02 

0.19 

Rosenfeld 

0.0005 

1.0 

Diff .***• 

0.002 

0.97 

Adapt.***** 

0.03 

0.19 

*  Note  that  True  Detections  Per  Real  Edge  also  reflects 

sensitivity  of  the  specified  operator  to  diffused  ramps  (some 
operators  nave  trouble  following  the  diffusing  ramp  very  far, 
therefore  score  poorly). 

**  This  reflects  ambient  noise  level  in  the  S=1.0  pictures 

(non-central  ramp  edges  -  i.e.  edges  that  should  not  be 
detected). 

***  Hueckel-like  Operator  (Appendix  A). 

****  Difference-based  Multiple  Operator. 

*****  Adaptive  Threshold  Multiple  Operator. 


Table  2.2..  Edge  Following  Ufinfifeha 


Operator  !  Diagonal  Length*  !  Vertical  Length 


Hueckel 

0.2 

0.4 

Psuedo-H 

0.2 

0.62 

Yakimovsky 

0.47 

0.77 

Burr 

0.19 

0.19 

Rosenfeld 

1.0 

1.0 

Diff. 

1.0 

1.0 

Adapt . 

0. 19 

0.2 

Lengths  are  scaled  by  total  diffused  contour  length. 

Table  2.3.  Positional  Variation  Along  Diffusing  Ramp 


Operator  !  Positional  Variation* 


Hueckel 

0.09 

Psuedo-H 

0.11 

Yakimovsky 

0.  15 

Burr 

0.20 

Rosenfeld 

0.09 

Diff. 

0.00 

Adapt . 

0.05 

Distance  variation  orthogonal  to  diffused  ramp  scaled  by  ramp 


Table  &L.  Minlaw  Sfcrlaa  siaa 

Operator  I  Stripe  Width  (Cells) 


Hueckel 

5-7* 

Psuedo-H 

5-7* 

Takimovsky 

7-9! 

Burr 

3-4* 

Rosenfeld 

l-2« 

Diff. 

Adapt . 

3* 

*  Set  by  basic  neighborhood  size. 

**  Set  by  smallest  neighborhood  size.  Note,  however,  due  to  large 
operator  preference,  many  times  operation  in  areas  with  small 
scrips  is  dictated  by  the  largest  neighborhood.  In  this 
implementation  16x16. 

•**  Set  by  smallest  neighborhood  size. 

2.6.  Sensitivity,  Noise  Immunity,  and  Shape  Deformation 

If  by  sensitivity,  we  mean  ability  to  detect  gradual  gray  scale 
transitions,  clearly  the  Rosenfeld,  Takimovsky,  and  our  difference  based 
operators  are  the  best .  The  Xakimovsky  does  well  because  of  a  large 
neighborhood  size.  This  will  be  a  draw  back  in  detecting  smaller 
features  in  scene.  Both  our  operator  and  Rosenfeld' s  do  well  in  the 
domain  of  diffused  lighting  because  they  both  have  the  ability  to 
average  over  large  areas,  thus  detecting  small  gradual  changes.  The 
Hueckel  and  Hueckel-like  operators  are  a  disappointment.  With  such 
large  neighborhoods  (9x9)  one  would  hope  for  better  gradual  slope 
detection.  Possibly,  this  can  be  explained  by  considering  that  these 
operators  are  quite  finely  tuned  to  detect  step  changes.  The  gradual 
slope  generate  too  low  a  confidence  step. 

In  the  real  pictures,  the  results  with  respect  to  sensitivity  seem 
to  corroborate  those  from  the  artificial  images.  The  Hueckel  and 
Hueckel-like  operators  are  universally  less  sensitive  than  the  various 
difference-based  operators  (Diff.,  Adapt.,  Rosenfeld,  and  Burr).  In  the 
line  picture,  many  lines  detected  are  single  (Instead  of  paired),  and 
miss  directed  segments  are  common.  In  dots,  the  dot  shapes  are  badly 
smeared  and  deformed.  Directional  interactions  in  the  grid  picture  seem 
to  occur  at  each  vertex.  These  operators  do  the  best  in  the  metal 
cylinders,  but  still  not  better  than  the  other  techniques. 
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The  Yakimovsky  operator,  on  real  pictures,  tends  to  smear  small 
detail  as  is  to  be  expected  with  such  a  large  neighborhood  operator.  In 
lines,  double  lines  become  single  lines.  In  the  dots  shot,  each  dot  is 
enlarged  and  flared  at  the  corners.  (Corners  on  a  round  dot?  See  for 
yourself.)  These  flaring  corners  come  from  the  peculiar  neighborhood 
shapes  proposed  for  this  operator  [66].  The  grid  shows  the  3ame  signs 
as  the  lines  picture  (singles  where  there  should  be  doubles).  In  the 
metal  cylinder  picture,  we  get  a  good  outline  but  are  weak  on  internal 
details  and  shadow  completions.  Yakimovsky 's  operator  is  better  than 
the  Hueckel  and  Hueckel-like  operators,  but  is  less  effective  than  the 
difference-based  techniques. 

The  Rosenfeld  non-linear  operator  does  very  well  for  somethings  and 
poorly  for  others.  On  gradual  vertical  ramp  detection,  it  gets  nearly  a 
perfect  score.  For  diagonal  following,  as  the  ramp  gets  further 
diffused,  edge  position  varies  wildly.  Thi3  occurs  in  part  due  to  our 
use  of  only  two  orientations.  The  diagonal  orientation  causes  the 
vertical  edge  peaks  to  be  displaced  from  the  horizontal  peaks  due  to  the 
square  shape  of  the  averaging  windows  used  in  differencing.  More 
orientations  in  the  detector  system  could  help.  Also  narrow 
(non-square)  averaging  rectangles  would  minimize  the  problem. 

In  the  real  picture  the  Rosenfeld  operator  does  poorly  because  its 
large  operator  preference  causes  it  to  lock  on  lighting  (metal 
cylinders)  effects  or  line  density  (lines)  effects.  We  have  not 
included  textured  pictures  as  in  [52],  though  if  we  had  this  operator 
would  have  performed  well  on  them.  Basically  we  see  a  common  idea  being 
applied  to  two  different  problems,  getting  rather  different  results. 
The  Rosenfeld  strategy  is  to  bias  a  multiple  template  detection  scheme 
towards  the  large  operators,  to  average  out  textures.  Our 
difference-based  tecnnique  is  biased  towards  the  small  operators  which 
are  significant,  to  detect  all  fine  structure  without  undue  smearing. 

The  Burr  techniques  and  our  difference-based  technique  are  very 
similar.  Burr's  detector  has  the  same  shape  and  size  as  our  detector 
number  3  (Figure  2.5).  We  used  center  of  mass  or  absolute  maximum  peak 


selection,  while  Burr  uses  relative  maximum.  This  explains  Burr's  lower 
noise  immunity.  Being  a  single,  small  neighborhood,  Burr's  detector  has 
problems  when  a  edge  becomes  to  diffused.  This  shows  in  the  ramp 
pictures  and  later  in  the  metal  cylinders  picture  (Note  the  left  most 
shadow  in  our  difference-based  output  and  Burr's.  He  misses  edge  points 
critical  in  completing  the  shadow  contour  because  they  are  too  diffused 
and  weak) . 

Our  difference-based  operator  does  well  on  all  the  test  pictures. 
The  position  difference  between  vertical  and  horizontal  edges  in  tne 
diagonal  test  pictures  occurs  because  on  diffused  edges,  large  operators 
are  required.  These  operators  fall  off  the  end  of  the  picture  if  the 
ramp  is  too  close  the  border.  The  cells  off  the  picture  are  set  to  the 
value  of  the  last  cell  on  the  border  of  the  picture.  This  is  not  a  good 
heuristic.  Therefore,  the  positional  inaccuracy  is  not  due  to  the  basic 
operator  itself.  We  should  not  be  operating  that  close  to  the  edge  of 
the  picture  with  large  operators. 

Our  adaptive  threshold  technique  is  about  at  the  same  level  of 
sensitivity  as  the  Burr  operator.  This  is  because  Burr  subtracts  cells 
in  multiples  of  threes  and  so  does  this  implementation  of  the  adaptive 
operator.  This  operator  has  less  noise  immunity  due  to  the  additive 
Laplacian  (these  operation  are  actually  digitized  second  derivatives, 
therefore  high  frequency  and  noise  enhancers).  On  the  other  hand,  it  is 
more  sensitive  in  real  pictures  (note  extra  definition  in  the  lower 
cylinder  of  the  metal  cylinders  picture). 

We  feel  the  difference-based  operator,  for  an  application  requiring 
the  detection  of  both  fine  detail  and  diffused  (shadow)  contours,  is  the 
overall  winner.  The  only  qualification  is  that  the  user  must  be  able  to 
select  an  accurate  noise  model  (set  of  thresholds  for  minimum 
significance).  Without  this,  the  results  go  down  under  the  noise 
easily. 
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2.7.  Conclusions 

The  overall  results  of  this  study  indicate  that  first,  difference 
base  edge  detection  is  not  only  the  easiest  to  implement,  most  efficient 
to  run,  but  also  in  most  cases  the  best,  when  coupled  to  a  reasonable 
edge  peak  selection  algorithm.  Second,  peak  selection  via  a  "global" 
method  (such  as  center  of  mass  or  absolute  maximum/minimum)  is  superior 
to  a  "relative"  method  (relative  maxima,  non-maxima  suppression). 
Thirdly,  several  edge  detector  sizes  are  superior  to  one  because  small 
ones  can  adapt  to  fine  structure  and  larger  ones  can  pick  out  harder  to 
"see"  gradual  intensity  changes.  Stable  Higher-level  processing  is 
based  on  solid,  accurate  low-level  measurement  (Figure  2.23). 


Rigfit  And  Left  Retina  laages,  Edges,  And 
Edges  Derived  From  Region  Boundaries 


dges  Derived  From  Region 
Figure  2.29 
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Chapter  3 

Intermediate  Vision  -  Vertex-String-Surface  Graphs  (V-3-3) 


3.1.  Introduction 

We  will  introduce  novel  algorithms  for  region  (surface) 
aggregation,  boundary  string  following,  and  vertex  detection  and 
reconstruction.  In  our  view,  visual  recognition  can  be  divided  into 
three  distinct  levels.  The  first  is  basic  measurement,  in  this  system 
edge  detection  3nd  image  sampling  (Chapter  2).  A  system  will  be 
fundamentally  limited  in  discrimination  power  by  the  dimensions  of 
measurement  chosen.  We  have  limited  ourselves  to  a  system  using  blacit 
and  white  or  color  images  (we  process  texture  in  a  limited  way),  with 
all  pertinent  image  changes  characterized  as  steps  (the  position, 
orientation,  and  transition  size  of  each  step  is  recorded).  Image 
dimensions  are  variable  from  ixlxt  bit  to  504x476x9  bits,  however  we 
typically  have  used  252x238x5  bits  (edge  detection  experiments  in 
Chapter  2)  or  252x233x9  bits  (following  chapters). 

The  second  phase  is  object-directed  segmentation,  or  intermediate 
vision.  In  this  phase,  we  combine  basic  measurements  into  primitive 
groupings  representing  major  object  components.  The  object  components 
generated  may  be  considered  as  objects  in  tneir  own  rignt,  however 
differ  from  learned  objects  because  their  interpretation  is  fixed. 
Basic  measurements  have  preprogrammed  form  and  preprogrammed 
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interpretation  (such  as  intensity  steps,  spots,  color  feature  vectors, 
intensities).  Object  components  have  preprogrammed  meanings,  but 
variable  forms  (regions  -  made  up  of  an  arbitrary  collection  of  adjacent 
feature  and  position  measurements,  boundary  strings  -  made  up  of  a 
collection  of  adjacent  intensity  step  points,  vertices  -  made  at  the 
coincidence  of  an  arbitrary  number  of  strings  endpoints). 

The  third  phase  is  object  formation  and  recognition.  This  will  be 
dealt  with  later  (Chapters  4  and  5).  Learned  objects  have  variable 
forms  and  variable  meanings.  In  practical  vision  systems,  there  are  no 
such  things  as  impossible  objects  [25].  All  objects  that  are  formed  by 
data  measured  from  the  real  world  are  possible  and  must  be  represented. 

Our  system  for  intermediate  vision  reads  edge  data,  transforms  this 
to  edge  and  region  data  (still  in  pixel  format).  These  two  forms  of 
information  may  be  passed  to  texture  processors  or  shape  processors. 
Texture  processes  transform  edge  information  (masked  by  region  data) 
into  smoothed  images  for  further  edge  detection.  Shape  processors 
superimpose  edge  data  from  multiple  sources  (region  differentiations, 
intensity  edges,  colored  image  edges,  or  texture  image  derived  edges), 
and  extract  contours  and  vertices  in  the  form  of  the 
vertex-string-surface  graph  structure  (V-S-S).  Further  3cene  to  scene 
matching  and  recognition  is  done  with  data  encoded  in  these  V-S-S 
structures  alone. 

3.2.  Related  Work 

Compared  to  work  In  low-level  vision,  relatively  little  work  has 
been  done  in  intermediate  vision.  The  work  of  Marr  on  occlusion,  depth 
extraction,  and  texture  is  significant  [31].  Motivated  by 
neurophysiological  results  Marr  proposes  measurements  in  the  form  of  a 
primal  sketch,  basically  a  map  of  edges.  The  measurements  in  the  primal 
sketch  are  then  correlated  together  for  various  texture  parameters, 
contours,  symmetries,  and  depth  primitives.  The  proposed  texture  model 
is  based  on  histograms  of  edges  and  connections  between  near  edges 
(histograms  of  either  directions  of  edges  and  connections  or  sizes  of 


edg9s  and  connections) .  The  critical  idea  in  Marr's  work  is  that  vision 
consists  of  forming  symbolic  descriptions  of  visual  forms  at  every  level 
of  representation,  not  simply  at  the  level  of  object  descriptions.  In 
tnis  way  Marr's  ideas  and  ours  about  object  components  are  very  similar 
even  though,  in  form  and  process,  they  are  somewhat  different  (wnile 
again  proposing  a  different  computational  attack,  Waltz  also  shares  this 
affinity  towards  symbolic  low-level  vision). 

Some  work  has  been  done  on  region  growing  algorithms  which 
incorporate  some  concept  of  figure  "goodness".  The  Brice  and  Fennema 
region  grower  merged  to  minimize  perimeter  growtn,  therefore  favoring 
compact  regions  [6].  The  region-based  system  of  Tenenbaum  used  a  priori 
information  to  partially  control  region  merging  [60].  Feldman  and 
Yakimovsky  also  have  pursued  region-based  techniques  which  incorporate 
semantics,  via  a  Baysian  strategy  [14], 

In  the  area  of  contour  extraction  and  codification  very  many  ideas 
have  been  examined.  Martelli  uses  an  optimization  technique,  coupled 
with  potentially  extensive  search  to  find  optimal  contours  [40].  Very 
many  chain  encoded  contour  techniques  have  been  used.  Techniques  based 
on  Hough  transforms  have  been  used  both  in  domains  consisting 
exclusively  of  straigi.t  line  segments  and  those  having  limited,  but  more 
general  segment  types  [46]. 

Corner  detection  in  most  cases,  has  been  done  by  inference  from 
contour  data  [16][47].  Some  work  has  been  done  on  direct  corner 
detection  [12].  We  have  found  that  any  attempt  to  directly  detect 
corners  in  image  data  is  relatively  insensitive.  In  both  corner 
detection  and  contour  codification,  the  idea  of  curvature  arises. 
Curvature  information  extracted  from  a  fixed  grid  system  is  both  a 
function  of  the  underlying  curve  and  the  image  sampling  system.  This 
interaction  has  been  studied  for  some  pixel  spaces  [5]. 

From  a  computational  point  of  view,  tne  most  common  algorithms  at 
the  intermediate  level  have  been  simple  3nd  sequential  [48].  However, 
some  have  used  searcnes  with  bacxup  [58],  Much  recent  interest  nas  been 
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shown  in  relaxation  algorithms  [54].  Primarily  those  with  probabilistic 
interactions  have  been  used.  Examples  of  symbolic  relaxation  have  been 
employed  at  a  higher  level  [63]. 

We  will  first  examine  region  formation  from  edge  point  data.  We 
will  then  examine  our  method  for  integrating  edge  data  from  several 
sources  with  region  data  for  the  boundary-vertex  extraction  process. 
V-S-3  graph  formation  will  be  discussed,  followed  by  some  notes  on 
multi-color  processing  and  texture  processing. 

3.3.  Regions 

Artifact  vision  is  object  oriented.  Objects  may  be  viewed  as 
collections  of  surfaces  which  enclose  volume.  Each  of  these  surfaces 
project  onto  the  i^uige  space  as  one  or  more  two  dimensional  regions, 
each  having  relatively  homogeneous  characteristics.  Most  commonly 
observed  objects  have  surfaces  which  are  compact  (convex-connected),  or 
have  surfaces  which  are  easily  decomposed  into  compact  subsurfaces. 
This  indicates  that  our  algorithms  for  region  aggregation  on  the  image 
space  should,  whenever  possible,  be  biased  in  favor  of  compactness. 

Biological  vision  systems  seem  to  be  sensitive  to  figure-ground 
relationships.  By  this,  we  mean  that  regions  in  the  two  dimensional 
image  space  can  be  ordered  by  relative  embedding  relationships,  which 
correspond  to  object  level-background  level  separations.  Objects  are 
figures  with  respect  to  the  actual  background.  Painted  surface  details 
are  figures  with  respect  to  major  object  surfaces.  Highlights  are 
bright  figures  with  respect  to  surface  details  or  object  surfaces. 
These  embedding  relationships  can  sometimes  be  thought  of  as  links 
between  levels  in  the  dendograms  generated  by  a  traditional  intensity 
region  merging  sequence  [29].  In  general  however,  these  relations  are 
purely  geometrical.  For  object  component  region  generation  embedding 
relations  should  be  recovered,  and  used  in  the  region  growing  scheme  if 
possible . 
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Strict  intensity-baaed  region  growing  schemes  have  a  tendency  to 
either  generate  large  numbers  of  regions  when  severe  thresholds  are 
used,  or  a  smaller  number  of  regions,  where  each  actually  corresponds  to 
more  than  one  object  surface,  when  loose  thresholds  are  allowed  (Figure 
3.1b-c).  In  fact,  places  where  several  object  surfaces  tend  to  merge  in 
region-based  processing,  are  also  places  where  edge  detection  systems 
fail  to  find  high  confidence  edge  points  (Figure  3. id). 


A)  Sample  Scene; 

B)  Regions  From  Strict  Threshold  System; 

C)  Regions  From  Loose  Threshold  System; 

D)  High  Confidence  Edge  Points. 

Figure  3.1 

To  remedy  the  problem  of  low  confidence  local  data,  some  form  of  a 
priori  information  needs  to  be  incorporated.  Sometimes  scene  related 
information  has  been  used,  however,  this  restricts  generality  from  scene 
to  scene.  We  have  chosen  to  build  our  region  aggregation  system  around 
the  idea  of  figure  "goodness".  That  is,  we  make  regions  'which  tend  to 
be  locally  compact. 
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3.4.  Homogeneity 

We  Have  defined  Homogeneity  in  a  domain  or  area  to  mean  absolute 
absence  of  High  confidence  edge  points.  In  short,  our  region 
aggregation  scheme  consists  of  assigning  unique  region  numbers  to  each 
connected  Homogeneous  area.  All  non-Homogeneous  areas  are  then 
classified  by  several  schemes  to  refine  the  shapes  of  the  homogeneous 
regions . 

The  larger  the  local  domain  used  for  homogeneity  computations,  the 
larger  the  length  of  missing  edge  data  that  can  be  completed  (Figure 
3.2).  However,  the  larger  the  operator  size,  the  more  sharp  curvatures 
are  modified  (Figure  3.3).  We  use  digitized  approximations  to  disks  for 
the  basic  homogeneity  operations  for  computational  ease  and  to  minimize 
boundary  shape  biases  (Figure  3.4). 


"C> 


Homogeneity  Operator  Passing  Through  A 
Break  In  Edge  Data 
Figure  3.2 
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Note  the  arrangement  of  pixels  and  edge  points  in  Figure  3.5. 
Edges  some  in  two  kinds,  vertical  and  horizontal,  each  between  pixels. 
The  edge  positions  inside  the  domain  boundaries  are  checked  in  the 
homogeneity  computations. 
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Edge  and  Pixel  Coordinates 
Figure  3.5 


3.5.  Region  Sweeping 

To  form  regions,  we  sweep  a  homogeneity  operator  (one  from  the  set 
in  Figure  3.1*),  uniformly  over  the  image  space.  As  edge  points  enter 
the  domain,  a  counter  is  incremented.  As  they  leave,  the  same  counter 
is  decremented.  The  cells  in  the  homogeneity  disk  are  marked  (set  to  a 
region  number)  if  the  counter  is  zero  (l.e.  centered  on  an  area 
including  no  edges).  If  the  counter  is  non-zero,  the  next  homogeneity 
position  is  evaluated.  As  an  efficiency,  the  image  space  is  divided 
into  32x32  sectcfrs.  After  all  the  cells  in  a  sector  have  been  marked,  a 
bit  in  the  corresponding  location  in  a  32x32  map  is  set,  preventing 
further  evaluations  within  the  sector. 
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The  marking  operation  also  involves  generation  of  region  numbers 
(these  numbers  are  the  marks,  recorded  in  a  "region"  image,  where  each 
pixel  contains  an  index  into  the  region  property  table) .  There  are 
three  methods  used  for  generating  numbers.  The  easiest  is  used  if  none 
of  the  four  cells  adjacent  to  t-he  central  cell  has  been  assigned  a 
number  (Figure  3.6  -  We  do  not  consider  cells  on  the  diagonal  to  be 
adjacent  because  there  is  not  a  shared  edge  position).  In  this  case, 
the  next  sequential  number  (and  region  table  entry)  is  assigned. 
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Cell  Adjacency 
Figure  3-6 

The  second  easiest  case  occurs  when  only  one  region  number  occurs 
in  any  or  all  of  the  adjacent  cells.  In  this  case,  the  region  number 
assigned  is  this  number  (thus  smearing  the  region  along).  The  difficult 
case  occurs  when  there  are  several  different  numbers  from  which  to 
choose.  In  this  last  case,  all  the  regions  Indexed  that  have  been 
formed  previously  in  this  sweep  (or  pass),  are  "merged".  The  merging 
consists  of  selscting  one  of  tne  indices  as  a  root  index  (this  becomes 
the  index  to  be  used  for  new  marking),  and  modifying  the  table  entries 
for  all  the  other  region  indices  to  "point"  to  the  root  entry. 
Therefore  all  the  non-root  entries  become  "aliases"  for  the  root  number. 
In  this  way  we  avoid  renumbering  previously  marked  cells. 

If  some  of  the  region  indices  adjacent  to  the  central  cell 
represent  regions  formed  in  a  sweep  other  than  the  one  currently  being 
performed,  simple  merging  would  not  always  be  appropriate  (next 
section).  In  these  cases,  an  association  is  stored  between  the  old 
region  entries  (from  prior  sweeps)  and  the  new  one  (formed  this  sweep). 
When  a  single  pass  is  finished  (an  operator  has  been  evaluated  over 
every  unmarked  cell),  these  associations  are  examined. 


It  should  be  noted  that  region  number  propagation  occurs  tnrough 
only  the  directly  adjacent  cells,  however,  cell  setting  occurs  over  the 
entire  homogeneity  disk  area.  This  little  trick  makes  it  necessary  for 
any  edge  string  breaks  to  be  on  the  order  of  the  diameter  of  tne 
homogeneity  disk  before  the  two  different  regions  are  jointed.  Also  it 
causes  the  remote  possibility  of  setting  noncontiguous  areas  to  the  same 
region  numbers . 

The  sweeping  operation  occurs  for  each  different  operator  size  from 
the  largest  to  the  smallest.  Following  each  sweep  pass,  region  table 
space  is  reclaimed  and  cells  are  reindexed  ("aliases"  are  compressed 
out).  The  larger  the  operator  allowed  to  aggregate  a  given  region,  the 
larger  the  breatcs  in  edge  data  that  can  be  tolerated.  Also,  some 
classes  of  subjective  contours  are  generated  because  of  compact  area 
enclosure  (Figure  3.7). 


Subjective  and  Real  Contour  Completions 
Figure  3.7 
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3.6.  Region  Noise,  Ambiguity,  and  Corners 

There  are  some  striking  similarities  between  this  scheme  for  region 
aggregation  and  the  edge  detection  techniques  previously  discussed 
(Chapter  2).  In  the  edge  detection  scheme,  small  detectors  control  the 
process.  In  the  region  system,  larger  operators  play  the  same  role.  In 
the  edge  detection  scheme,  small  operator  outputs  are  evaluated  against 
a  noise  model  to  determine  their  significance.  Similarly,  the  regions 
formed  after  a  sweep  operation  need  to  be  checked  for  significance. 
This  is  done  in  two  ways.  The  first  and  most  obvious  requires  regions 
to  be  of  significant  size.  The  size  parameter  we  have  used  in  our 
experiments  is  nominally  50  cells  (out  of  252x238).  This  significance 
dimension  tends  to  become  important  for  sweeps  with  smaller  operators 
(size  3,2,  or  1). 

A  slightly  less  obvious  significance  parameter  arises  from  tne 
relative  sizes  of  the  homogeneity  operator  and  the  region  aggregated  by 
this  operator.  If  the  operator  is  too  large  (diameter  approximately  the 
same  as  the  smallest  body  crossing  chord),  one  region  may  be  broken  into 
several  parts  (Figure  3.8).  It  is  easier  to  get  such  things  right  than 
to  patch  them  up  later  so  we  have  defined  the  following  parameter: 

3.6.1.  f>  m.  (4*  ar*</peiriMeter')/ (  S>je*z) 

If  this  ratio  is  greater  than  a  threshold,  the  estimated  region  diameter 
(4*area/perimeter)  is  significantly  greater  than  the  operator  diameter 
(size*2),  and  therefore  this  region  is  significant.  This  significance 
test  tends  to  eliminate  undesirable  large  operator  aggregated  regions. 
The  threshold  we  use  is  nominally  1.0. 


56 


A  Segion  Broken  By  A  Large  Homogeneity  Operator 

Figure  3.3 

Following  each  region  generation  sweep,  during  region  table  space 
compression,  we  have  the  opportunity  to  reassign  region  cells  to 
"unprocessed".  If  a  region  is  computed  to  be  insignificant,  its  cells 
are  so  marked,  and  the  corresponding  entry  in  the  region  tables  is 
eliminated . 

If  sharp  curvatures  exist  in  the  bounding  contours  of  a  region, 
there  will  be  areas  at  these  corners  where  large  operators  will  fail. 
Successively  smaller  operators  will  be  able  to  succeed  in  such  areas. 
However,  the  smaller  operators  will  tend  to  give  incorrect  results  due 
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to  the  fact  that  near  these  corners  high  confidence  edges  will  be  absent 
(Figure  3-9).  It  will  be  generally  difficult  to  determine  where  tne 
small  regions  at  corners  should  be  incorporated.  We  would  merge  them 
into  an  adjacent  region  if  such  a  merger  was  unique  (i.e.  the  small 
region  is  associated  to  only  one  other  region  -  Figure  3 • 1 0 ) ,  but  when 
uniqueness  is  not  met  the  ambiguity  is  not  easily  resolved.  In  this 
ambiguous  case  cells  in  the  region  are  marked  as  ambiguous,  and  left  to 
be  processed  in  the  final  refinement  passes  following  the  sweep  of  the 
homogeneity  operator. 


Small  Region  Near  A  Corner  'With 
Ambiguous  Possible  Merges 
Figure  3.9 
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Small  Region  Near  A  Corner  With 
Only  One  Possible  Merge 
Figure  3* 10 

By  now  the  reader  has  probably  realized  that  the  post  pass 
following  each  region  sweep  is  actually  a  bundle  of  Kludges  (more 
politely,  heuristics),  designed  to  take  care  of  some  of  the  more 
pathological  problems  posed  by  the  homogeneity  operations.  The  last 
operation  done  in  this  post  pass  is  to  Kill  any  small  group  of  cells  set 
to  a  particular  region  number,  but  not  connected  to  the  main  mass  of 
that  region.  Recall  that  this  defect  comes  about  from  the  region 
setting  and  propagation  method.  An  advantage  of  simple  convex  shapes 
like  disks  is  that  the  problem  of  these  noncontiguous  sets  is  minimized. 

3.7.  Final  Region  Refinement 

After  sweeping  with  all  available  homogeneity  operators,  some  cells 
will  either  be  'unmarked  (near  seme  region  bounding  contours)  or  manced 
as  ambiguous.  We  want  ail  cells  classified  as  part  of  a  region, 
therefore  we  have  included  a  series  of  passes  wnich  assign  ceils 
adjacent  to  established  regions  to  the  region  wnich  tne  cells  are 


59 


nearest  to  in  intensity  space.  This  is  similar  to  traditional  merging, 
intensity-based  region  growing,  with  the  exception  that  no  new  regions 
are  formed.  Usually  no  more  that  15  passes  are  required  on  5  percent  of 
the  image  space  (due  to  the  sector  marking  scheme),  or  approximately  15 
seconds  of  processing  (quite  fast  compared  to  region  generation).  This 
intensity  growing  algorithm  is  capable  of  recovering  contours  near 
vertices  quite  well,  certainly  better  that  the  original  edge  detectors. 

The  final  result  of  the  region  system  is  a  picture  of  region 
indices  and  a  list  of  regions  and  region  properties  (Figure  3.H).  We 
accumulate  area,  perimeter,  centers  of  mass,  intensity  average, 
intensity  standard  deviation,  roughness,  and  order  (operator  size  which 
formed  the  region).  Regions  are  analyzed  to  determine  their  relative 
levels  of  embedding.  We  are  primarily  concerned  with  level-0  embedding 
which  represents  the  main  background  (such  as  a  table  top  or  floor), 
level-1  embedding  which  represents  all  object  details  (this  is  all 
levels  between  level-0  and  level-2),  and  level-2  embedding  which 
represents  highlights  and  the  innermost  surface  details  (this  level  may 
not  be  present,  and  may  not  be  tangent  to  level-0  regions).  Figure  2.23 
shows  two  views  of  a  block  (A),  the  edges  detected  (B),  and  tne 
boundaries  which  correspond  to  region  differences  (C). 
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Region  Token  and  List  Structure 
Figure  3.H 

3.9.  Image,  Edge,  Region  Data  Merging 

The  construction  of  boundaries  and  vertices  can  be  divided  into 
tnree  main  processes.  The  first  is  merging  input  edge  data  from  several 
sources:  the  original  gray  scale  image,  the  edge  detection  data,  the 
region  maps,  and  texture  processors  (after  edge  detection  on  texture 
smoothed  images) .  The  second  is  actual  boundary  and  vertex 
codification .  Last  is  smoothing  and  reorganization  of  data  into  the 
final  vertex-string-surface  (V-3-S)  graph  structure. 

The  merge  process  reads  the  image,  the  edge  images  (these  have  one 
edge  position  for  each  pixel  interface  in  the  original  image) ,  and  the 
region  image  to  produce  an  edge  list.  Each  point  in  the  output  edge 
list  may  be  accessed  directly  by  its  original  position  on  the  retinal 
space  by  a  simple  array  reference  (each  'L  coordinate  has  an  array  slot 
pointing  to  an  X  coordinate  sorted  list  of  endpoints)  followed  by  a 
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binary  chop  (locates  the  edge  in  a  selected  X  coordinate  sorted  edge 
list).  The  region  data  is  used  as  a  template  for  edge  data.  Any 
position  in  the  region  picture  where  region  numbers  differ,  an  edge  is 
indicated.  If  a  real  edge  point  exists  near  this  position  (i.e.  is 
marked  near  by  orthogonally  from  the  corresponding  position  in  the  edge 
images),  an  edge  is  entered  into  the  output  lists  as  a  region  separator 
edge  (region  image  indicates  two  separate  regions)  and  inherits  the 
properties  of  the  real  edge.  If  no  real  edge  is  near  the  region 
separation,  a  subjective  edge  is  entered  into  to  output  lists.  In  this 
way,  all  region  separator  edges  are  recorded  insuring  completely  bounded 
regions . 

Any  real  edge  points  marked,  in  the  edge  images,  but  not  absorbed 
by  region  separations  are  output  as  non-separator,  real  edges  in  the 
output  lists.  These  can  correspond  to  incomplete  boundaries,  or 
internal  creases  on  a  surface.  Non-separator  edges  are  more  suspect 
than  separator  edges  as  possible  noise  points.  Subjective,  separator 
edges  are  more  suspect  in  positional  information,  because  the  region 
aggregation  process  simply  put  them  where  it  wanted  them. 

The  gray  scale  image  is  used  to  compute  the  intensity  average 
immediately  on  both  sides  of  edge  points.  The  intensity  is  sampled  at 
1/2  the  transition  size  of  the  edge  point  on  both  sides  of  the  edge. 
Subjective  edges  are  taken  to  have  a  transition  size  of  zero. 

3.9.  Vertex  Detection  and  Boundary  Codification 

Vertices  are  really  discontinuities  in  several  dimensions.  They 
are  discontinuities  in  image  intensity  (as  are  edge  points).  They  are 
discontinuities  in  direction  (i.e.  sharp  boundary  inflections),  and  in 
topology  (separating  two  or  more  regions,  two  or  more  strings).  By 
string  we  mean  "macro"  edge  points.  That  is,  chains  of  edges  which 
connect  end  to  end  and  have  only  two  end  points  (terminating  in 
vertices,  looping  back  to  each  other,  or  terminating  in  free  space).  To 
find  vertices  we  really  want  to  find  points  where  all  the  above 
discontinuities  coexist. 


■a. 
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First  we  attempt  to  attach  each  edge  endpoint  to  another  edge 
endpoint  in  a  reflexive  manner.  By  this  we  mean,  if  endpoint  k  can  only 
be  connected  to  endpoint  B,  and  endpoint  B  can  only  be  connected  to 
endpoint  k,  they  can  be  connected.  k  connection  neighborhood  is 
expanded  about  each  edge  endpoint  as  in  Figure  3*12.  Possible  edge 
connections  are  checked  at  each  order  of  expansion.  When  reinforcing 
connections  as  in  Figure  3 . 1 3 ,  are  found,  reflexive  linking  occurs. 
Edge  endpoints  involved  in  more  complicated  connection  geometries  are 
flagged  as  possible  entries  into  vertices.  k  new  vertex  number  is 
assigned  for  each  of  these  areas  of  connection  ambiguity.  In  effect, 
these  possible  vertices  are  detected  where  topological  discontinuity  is 
locally  indicated. 
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noise  generated  or  real.  Some  that  are  real  will  not  be  detected  by  the 
simple  topological  discontinuity  criterion.  For  this  reason,  2-degree 
vertices  are  also  discarded,  to  be  regenerated  at  points  of  high 
curvature  later.  The  appropriate  string  concatenations  are  then 
performed.  The  vertices  that  remain  are  3-degree  and  higher,  and 
relatively  certain,  but  their  positions  are  not  accurately  known. 

3.10.  Vertex  Reconstruction  and  String  Smoothing 

Reconstruction  and  smoothing  is  a  two  pass  operation.  We  first 
compute  accurate  positions  for  ^-degree  and  higher  vertices.  We  then 
smooth  and  detect  2-degree  vertices  at  points  of  maximum  curvature.  The 
reconstruction  and  smoothing  is  then  performed  again  using  all  vertices. 

Vertex  reconstruction  consists  of  computing  the  perpendicular 
least-squares  line  fit*  for  each  string  entering  the  vertex. 

3.9.1.  ^  *  i  Z  <l 

^  J'  *  *-  'J>  <X* )  JI  )  -  Coordinates 

— -  Number  of 

K  ■  Edge  Points 

cc«(7‘-  f)  -  (  y"2) 

T  *  ~ cc/j Ct 4sx 

A  •/TTr  s=[-sfr<  s]Ji-r  C-At^jSj 

for  line:  At  «  C, 

Two  linearly  independent  equations  are  all  that  are  required  to 
position  a  vertex.  When  more  are  available  (3-degree  vertex  or  more), 
the  system  is  solved  for  the  best  intersection  point  in  a  least-squares 
sense. 


*  These  equations  (3-9.1)  and  several  others  are  courtesy  of  Rod 
Fletcher. 
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3.9.2.  [A  *  *  6,ym  £,]  uJ, 

[At*  +  &»•)•  * I ]«)X 

[Am*  *  •  fi,]  *>„ 

n  equations  n  weights 


Zn-.A* 

1 

•  — 

X  HA  £; 

>»«,* 

Solve: 

TEutiAiBj  £  tMj  B; 
J’1,*  «•*,» 

3 

“  - 

When  vertex  positions  are  fixed,  a  least  squares  line  is  fitted  at 
each  point  in  the  strings  (Figure  3. 11*).  The  A  and  B  coefficients  are 
used  to  compute  a  tangent  angle.  These  angles  are  used  to  compute 
change  in  angle  with  respect  to  arc  length,  or  curvature  along  the 
strings.  Two  degree  vertices  are  recovered  by  examining  these 
curvatures.  As  in  linear  difference  edge  detection,  an  edge  (in  this 
case  a  2-degree  vertex)  is  located  at  points  where  the  difference 
(curvature)  is  greater  than  a  threshold  T  and  maximum  over  a  specified 
domain  (Figure  3.15).  All  the  tricKs  used  for  edge  detection  are 
applicable  in  this  situation  when  scaled  down  to  a  single  dimension 
(along  the  string).  In  practice,  we  may  use  a  single  sized  difference 
operator  with  a  single  threshold  and  get  acceptable  results. 
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After  the  2-degree  vertices  are  marked,  we  perform  the  tne  same 
reconstruction  algorithm  on  them,  followed  again  by  the  linear  equation 
fitting  about  each  string  point.  This  time  however,  each  string  point 
is  repositioned  along  the  perpendicular  to  the  least  squares  line  onto 
that  line,  thus  smoothed  (Figure  3.16). 


Edges  Along  A  String  Moved  During  Smoothing 
Figure  3-16 


Sometimes,  incorrect  highly  subjective  edge  strings  cause  false 
vertices  to  fora,  perturbing  the  positions  of  nearby  real  vertices  (due 
to  our  perhaps  too  powerful  smoothing  and  vertex  repositioning  machinery 
-  Figure  3 .17).  This  is  corrected  by  a  post  pass  which  exaaines  all 
edge  strings  that  are  highly  subjective  (ones  formed  solely  by  our 
region  aggregation  system,  without  adequate  edge  data  corroboration). 
When  required,  these  highly  subjective  strings  are  deleted,  and  the 
vertex  positions  in  the  vicinity  are  recomputed.  The  region  data  base 
remains  unchanged  by  these  deletions,  therefore  some  regions  may  only  be 
partially  surrounded  by  contour  data.  The  missing  sections  correspond 
to  areas  where  connection  is  assumed,  but  wnere  the  exact  nature  of  that 
connection  is  in  question. 


Subjective  Contour  Foraing  A  False  Vertex 
Figure  3. w 

At  this  point  we  now  have  a  data  base  containing  all  vertices 
linked  to  the  strings  which  form  then,  along  with  all  entry  angles.  We 
have  the  strings  as  property  lists  containing  the  average  transition 
sizes  of  their  edges,  the  average  Intensities  to  the  left  and  right  of 
their  edges,  the  variations  in  intensity  to  the  left  and  right  of  their 
edges,  the  variations  in  the  differences  across  their  edges,  the  regions 
separated  by  their  edges,  and  indices  to  ordered  lists  of  edge  positions 
and  tangents,  to  describe  the  string  shapes.  Lastly,  we  have  a  region 
list,  containing  the  properties  coaputed  by  the  region  aggregation 
sub-systea. 
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Froa  this  point  forward,  all  processing  is  removed  froa  the 
original  picture  space.  All  successive  operations  are  performed  on  the 
vertex-string-surface  graph  structure.  Figures  3.18-3.24  show  the 
images,  iaages  after  edge  detection,  edges  after  vertex  marking  (X's 
aark  edges  entering  possible  vertices),  vertex  initial  and  reconstructed 
positions,  distance  moved  by  edges  during  smoothing,  the  V-S-S 
structure,  and  the  V-S-S  structure  after  a  post  pass  (using  Raaer's 
method  to  chop  curved  segments). 


Images 
Figure  3.18 
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Edges  For  Figure  3.18 
Figure  3.19 


Vertex  Marked  Edges 
Figure  3.20 


Vertex  Initial  (Dots)  and  Reconstructed  (X) 
Positions 
Figure  3.21 
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Distance  Edges  Are  Moved  During  Smoothing 
Figure  3.22 


V-S-S  Graph  Derived  Plot 
Figure  3.23 


V-S-S  Graph  After  Curved  Segment  Chopping 
Figure  3.24 
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3.11.  Notes  on  Additional  Feature  Distensions 

In  most  of  the  examples  discussed  gray  scale  Intensity  data  has 
been  processed.  We,  for  the  most  part,  restrict  ourselves  to  this 
variety  of  data  because  it  is  the  simplest  to  obtain.  However,  the 
processing  we  have  so  far  described  is  not  limited  to  simple  intensity. 

Tne  most  obvious  form  of  multispectral  data  is  color.  We  obtrain 
color  information  by  taxing  three  pictures  through  red,  green,  and  blue 
color  filters.  Many  workers  have  proposed  the  combination  of  these 
spectra  into  hue  and  saturation.  We  choose  not  to  do  so  because  our 
approach  is  to  extract  critical  discontinuities  (edges)  and  organize 
them  (form  regions,  contours,  and  vertices,  each  with  property  vectors). 
This  can  be  done  in  the  R-G-B  space  as  well  as  in  the  hue-saturation 
one.  In  the  R-G-B  space,  noise  models  are  easier  because  they  are 
exactly  the  same  as  for  intensity  alone  (directly  a  result  of 
digitization  and  sensor-  noise).  If  R-G-B  is  transformed  to 
hue-saturation,  noise  functions  are  transformed  and  since  this  is  a 
non-linear  transformation,  the  noise  transformation  is  non-linear.  For 
this  reason  we  choose,  in  general,  to  worX  with  data  in  the  form 
directly  generated  by  sensors. 

Any  discontinuities,  in  any  of  the  feature  dimensions  represents  a 
significant  point.  Therefore,  these  discontinuities  can  be  extracted 
independently  and  superimposed.  In  the  set  of  all  discontinuities,  each 
individual  one  inherits  properties  from  each  feature  dimension  in  which 
it  occurs  (i.e.  the  tacit  assumption  here  is  that  if  discontinuities 
co-occur  in  different  feature  spaces,  bu'„  at  nearly  the  same  spatial 
position,  they  are  all  generated  by  the  same  underlying  process).  In 
this  way,  any  system  can  not  only  be  easily  expanded  to  process  color 
(i.e.  light  sampled  in  three  frequency  bands),  but  a  wide  variety  of 
alternate  feature  domains  also.  Figures  3.25-3.25  show  two  color 
pictures,  and  the  corresponding  red,  green,  blue,  and  composite  edge 
pictures . 
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Thig  image  is  from 
[45],  reduced  from 
800x400x8  bits  to 
252x238x9  bits. 

*  Our  images  are 
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[45].  We  process 
the  color  negative 
for  this  image. 

**  An  unavoidable 
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in  the  edge 
display. 
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3.12.  Notea  on  Spot  and  Edge  Textures 

Tnia  system  of  region,  vertex  and  string  encoding  does  not  directly 
process  texture  information.  However,  all  regions  wnich  can  not  be 
processed  as  smootn  are  marked  by  the  region  processor  as  rough.  These 
regions  may  be  passed  directly  (roughness  being  a  region  feature  vector 
component),  or  may  be  transformed  to  "smooth"  by  some  texture  smootning 
process.  We  believe  that  "texture"  is  actually  high-density  shape 
information.  Snape  information  with  higher  density  than  can  be  directly 
handled  by  the  main  line  shape  processes. 

When  a  region  has  been  too  "busy"  for  the  region  process  to  handle, 
we  use  one  of  several  primitive  shape  smoothers  to  form  images  which 
contain  null  areas  for  already  smooth  regions,  smooth,  non-null  areas 
for  regions  having  texture  types  compatible  with  the  snape  smoother,  and 
"busy"  areas  for  regions  for  which  the  shape  smoother  is  in  applicable. 

These  smoothed  images  may  then  be  passed  bacx  through  the  edge 
detection  system,  for  edges  between  the  smoothed  areas,  and  these  edges 
precipitated  into  the  global  edge  images  (busy  edges  in  smoothed  regions 
are  masked  away).  After  all  the  known  shape  smoothing  algorithms  have 
been  passed  over  rough  regions,  any  still  rough  areas  can  be  can  be  set 
to  smooth  and  called  "none-of-the-above"  rough  regions  (Figure  3.27). 


Textured 

Scenes 


After  Edge 
Detection 


After  Separating 
Smooth  And 
Rough 


After  Separating 

Different 

Orientations 


Texture  Processing 
Figure  3.27 


In  this  way ,  we  are  characterizing  texture  as  microscopic  shape 
information,  in  high  enough  density  configurations  to  be  characterized 
by  statistical  models  rather  than  by  specific  graphical  models.  This 
approach  allows  ouch  aore  integration  between  shape  processing  and 
texture  processing.  £ach  of  our  smoother  processes  may  be  thought  of  as 
a  oatcher  for  a  given  edge  and  edge  connection  histogram  (Figure  3.23), 
as  described  by  Marr.  Mar r  proposed  that  sucn  histogram 
characterizations  of  textures  could  be  sufficient  to  mimic  human  texture 
discrimination . 
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Figure  3.28 
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We  claim  that  texture-like  phenomenon  can  occur  at  any  level  of 
shape  representation,  and  therefore  cannot  be  considered  by  one  simple 
formalism  (like  spatial  spectral  analysis).  Also,  texture  processing 
can  best  be  understood  as  nigh  density  shape  detail,  too  dense  to 
process  explicitly.  Therefore,  texture  can  best  be  studied  in  the 
context  of  shape  extraction  and  representation. 

3.13.  Conclusions 

We  have  described  a  system  which  operates  on  single  images  to  form 
complete  vertex-string-surface  graphs.  Surfaces  are  aggregated  prior  to 
contour  processing  by  using  the  idea  of  homogeneity  as  the  basis  for 
region  segregation.  This  method  allows  edge  closure  and  the  generation 
of  some  classes  of  subjective  contours. 

The  contour-vertex  processing  described  finds  vertices  before 
contours,  by  examining  local  edge  connect  conditions  (reflexive 
connections).  This  method  allows  contours  to  be  labeled  without  path 
following  (we  use  a  pure  scan  line  algorithm).  Vertex  reconstruction 
and  contour  smoothing  is  performed  via  a  tangent  line  based  technique. 
It  is  shown  that  detection  of  curvature  discontinuities  along  contours 
is  analogous  to  the  detection  of  edges  in  image  spaces. 

Textured  areas  are  segregated  from  smooth  areas  for  optional 
additional  processing.  This  consists  of  a  variety  of  texture  smoothing 
processes.  Each  of  these,  for  a  given  class  of  textures,  will  produce 
smoothed  Images  from  rough  ones.  The  various  smoothed  Images  may  be 
edge  detected  and  the  results  superimposed  over  previously  detected 
smooth  region  boundaries.  In  the  same  way  color  images  can  be 
processed.  The  different  color  spectra  are  Individually  examined  for 
edges  with  all  detected  edges  being  marged  upon  completion. 
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Chapter  4 

Half  Chunks,  Depth,  and  Motion  Correlation 


4,1.  Introduction 

Detection  of  scene  change  has  been  performed  via  bulk  correlation 
techniques  which  nominally  reduce  to  picture  subtraction  or  convolution, 
followed  by  segmentation  and  characterization  of  difference  areas.  In 
these  approaches  "difference"  is  examined  very  closely  to  the  feature 
level  (i.e.  difference  in  intensities,  or  intensity  statistics).  Jain, 
Milltzer,  and  Nagel  [27]  have  used  a  method  which  compares  local 
statistics  to  determine  areas  in  scenes  where  change  has  occurred. 
Potter  [49]  has  used  image  differences  for  object  segmentation.  Gennery 
and  Burr  have  looked  at  bulk  depth  correlations,  Gennery  in  outdoor 
scene  contexts  and  Burr  for  machine  parts.  In  this  approach  patches  of 
one  image  are  bulk  correlated  along  a  path  of  probable  parralax  in  the 
matching  image  (a  convolution  operation).  Marr  does  depth  computation 
by  matching  symbolic  entitles  from  left  image  to  right  image,  minimizing 
depth  discontinuities  [35].  We  have  approached  the  problem  of  change 
correlations  in  even  a  more  symbolic  manner.  We  look  for  similarities 
in  the  structure  of  abstract  forms  produced  from  stereoscopic  motion 
picture  sequences.  This  is  similar  to  some  of  the  modeling  work  of 
Baker,  Baumgart,  and  Burr,  except  we  do  not  restrict  the  variety  of 
motion  allowed  for  objects. 
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We  acquire  our  structural  information  in  the  form  of  tf-S-S  graphs, 
one  graph  per  input  image.  These  graphs  are  re-encoded  as  half  chunk 
graphs  and  compared  for  depth  and  motion  cues.  The  first  variety  of 
graph  is  similar  to  that  used  oy  Underwood  [62],  however  we  encode  more 
complete  topological  and  geometrical  information.  The  second  kind  of 
graph  is  entirely  new.  The  graphs  of  several  different  images  are 
matched,  binding  atoms  and  links  of  common  types  together.  In  a  similar 
system,  Dudani  forms  matches  in  vertex  directed  way  [11].  Our  matching 
scheme  tends  to  be  guided  by  points  of  high  curvature  or  significant 
topological  properties,  but  only  because  these  have  high  discrimination 
value  (tend  to  form  unique  matches). 

This  approach  is  desirable  for  several  reasons.  First,  by  matching 
on  half  chunk  features,  computational  effort  can  be  reduced  by  selecting 
pairs  of  atoms  (or  links)  for  association,  consistent  with  uniquely 
matched  "seed"  atoms  (links).  Such  a  selection  procedure  assumes  that 
some  structural  features  (such  as  descriptive  contour  curvatures, 
vertices,  or  types  of  intensity  variations),  are  so  descriptive  that 
they  can  only  be  matched  to  one  corresponding  feature  from  another 
graph,  and  that  this  matching  is  correct.  This  condition  can  be 
guaranteed  by  deferring  image  matching  to  a  level  at  which  each 
"component"  to  be  matched  is  a  significant  part  of  a  wnole  object. 

Tne  second  advantage  we  get  from  matching  half  chunks  is  that  the 
various  atom  and  link  bindings  wnich  we  obtain  are  easily  interpreted  in 
terms  of  rigid  body  transforms  of  object  parts  and  eventually  whole 
objects.  Each  aggregation  of  bindings  around  a  given  unique  seed  match 
gives  rise  to  transforms  from  one  subgraph  to  another  directly. 

Finally,  our  techniques  are  equally  applicable  to  depth,  motion, 
and  image  registration  correlations.  Depth  correlations  primarily 
differ  from  motion  ones  by  the  allowable  transform  taking  one  graph  (the 
right  one)  to  the  other  (the  left  one).  Since  the  depth  and  motion 
transforms  are  easier  than  the  registration  ones,  we  have  included  them 
in  our  matching  procedures. 


4.2.  Graph  Re-encoding  (V-3-S  to  H-C) 

In  generating  the  single  image  graphs  we  found  it  useful  to 
segregate  the  various  components  by  their  geometrical  types  (i.e. 
vertices,  strings  and  regions).  We  found  this  convenient  because  each 
of  the  different  types  naturally  surfaced  from  our  data  during  different 
processing  steps  (small  strings  or  edges  first,  regions  next,  long 
strings  and  tentative  vertices  after  regions,  and  finally  true 
vertices) . 

For  the  purposes  of  graph  matching,  we  desired  a  uniform  encoding 
for  our  graphs,  in  which  all  tne  information  encoded  into  the  V-3-S 
graphs  could  be  used,  but  in  more  normalized  form  (Figure  4.1).  To  this 
end  we  have  invented  the  "half  chunk"  (H-C)  graph.  Each  "half  chunk" 
represents  a  fragment  of  one  half  of  a  bounding  contour  for  a  region  (or 
a  length  along  a  string).  Geometrically  it  represents  a  single  change 
in  boundary  curvature  (or  in  the  case  of  a  vertex,  a  tangent  angle 
discontinuity)  at  a  point  in  space  (3-space  or  2-space),  and  a  tangent 
line  at  that  point.  Topologically,  it  encodes  rive  connection  types, 
one  across-region  link  (the  R-link),  two  endpoint  extension  links  (the 
E-links),  and  two  across-boundary  links  (the  X-links,  Figure  4.2).  The 
half  chunk  also  carries  a  property  list  composed  of  features  computed 
from  the  region  and  string  feature  vector  lists. 


Each  half  cnunk  could  carry  a  completely  arbitrary  floating  point 
feature  vector  recording  things  like  region  color,  intensity  statistics, 
and  contour  diffusion  (degree  of  blurring).  However,  for  complex  scenes 
this  would  be  more  than  we  require.  We  want  to  make  and  record 
decisions  about  the  various  feature  dimensions,  rather  than  record 
actual  feature  vectors.  An  array  of  decision  features  are  computed  from 
feature  components  encoded  in  V-S-3  atoms.  The  noise  statistics  for 
these  decision  features  are  approximated,  as  modified  normal 
distributions,  and  are  used  to  trisect  the  feature  spaces  (Figure  4.3). 
True  (t)  records  a  strong  detection  of  a  feature,  "maybe"  (?)  a  weatc 
detection  of  the  feature,  and  false  ( f )  a  sure  non-detection.  This 
three  state  decision  property  is  recorded  rather  than  an  actual  feature 
vector  component . 


Two  Sided  Distributions  One  Sided  Distributions 

One-sided  and  Two-sided  Decision  Models 
Figure  4.3 

Each  H-C  decision  feature  is  related  to  some  question  we  would  like 
answered  concerning  the  intrinsic  properties  C3]  of  the  image  generating 
our  graph  structures.  We  summarize  each  decision  feature  as  follows: 


85 


Feature  0-3;  Intensity  Cnops 

Intensity  Is  scaled  so  that  the  minimum  intensity  is  0  and  the 
maximum  intensity  is  1  (so  as  to  be  light  level  independent). 
Feature  0  is  the  region  lightness  minus  the  0.5  level  (for  the 
region  linked  via  the  half  chunk  R-link).  Is  the  region  lighter 
than  average  or  darker  than  average?  Feature  2-3  are  successive 
intensity  chops  (divides  the  half  range,  the  quarter  range,  and  the 
eighth  range). 


Feature  4;  Boundary  Diffusion 

Nominal  Diffusion  is  3  (3  pixels)  due  to  the  Imaging  system. 
Is  a  specific  boundary  more  diffused  (i.e.  likely  to  be  a  shadow)? 

Feature  5;  Lightest/Darkest 

If  the  intensity  is  lower  than  0.5,  then  this  feature  is  the 
region  intensity  minus  the  darkest  region  intensity.  If  the 
intensity  is  higher  than  0.5,  then  it  is  the  lightest  region 
intensity  minus  the  region  intensity.  Regions  which  are  nominally 
the  lightest  in  a  scene  could  be  direct  reflections  of  the  light 
source  (or  a  highlight).  Darkest  regions  could  be  unlit  (or  deep 
shadows) . 

Feature  6;  Intensity  Step  Size 

This  is  the  difference  of  Intensities  across  a  boundary.  Is 
the  boundary  a  light  ->  dark  boundary  or  a  dark  ->  light  boundary? 

Feature  7;  Difference  Of  Boundary 
Intensity  To  Region  Center  Intensity 

This  is  obtained  by  subtracting  the  intensity  along  a  boundary 
from  the  average  region  intensity.  Surfaces  which  roll  away  from 
the  observer  get  darker  towards  the  boundary  (less  reflected  light). 
Flat  surfaces,  with  abrupt  Junctions  to  other  surfaces  have  roughly 
constant  lightness. 
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Feature  8;  Variation  Of  Feature  5  Along  A  Boundary 

This  is  the  standard  deviation  of  Feature  5  along  the  boundary. 
This  detects  a  uniform  lightness  ratio  across  a  surface-surface 
Junction.  This  can  occur  when  a  surface  is  painted  on  another. 

Feature  9;  Intensity  Variation  Along  A  Boundary 

This  is  the  standard  deviation  from  the  mean  intensity  along 
the  boundary.  This  feature  becomes  large  along  rolling  surface 
boundaries . 

Feature  10;  Variation  in  Region  Intensity 

This  feature  is  the  standard  deviation  of  average  region 
lightness.  If  non-zero  it  implies  the  region  is  rolling  and  lit  by 
a  light  source. 


Feature  11;  Average  Roughness 

Tnis  feature  is  the  average  edgedness  for  a  region. 

Feature  12;  Boundary  Re3l/Subjective  Index 

Tnis  feature  is  the  ratio  of  real  edges  per  edge  for  a  bounding 
contour.  It  is  a  function  of  how  broken  the  bounding  contour  was 
prior  to  region  aggregation. 

Feature  13;  Depth  Variation 
■•After  Depth  Correlation  Only** 

Planes  are  locally  fitted  to  all  correlated  n?if  chunks  after  a 
depth  binding  operation.  Any  H-C  with  an  anomalous  depth  relative 
to  its  average  plane  is  marked  as  having  a  depth  variation. 
Highlights  can  appear  as  either  under  or  over  the  average  surface 
level.  Occluded  Junctions  may  also  display  anomalous  depths. 


-v»  V 
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Feature  14;  Motion  Variation 
••After  Motion  Correlation  Only** 

Planes  are  locally  fitted  to  motion  correlated  half  chunKs. 
H-C's  with  anomalous  velocities  are  marked.  Shadows,  highlights, 
and  occluded  boundaries  can  show  this  affect. 

Feature  15;  Region  Embedding  Level 

Tnis  is  not  a  decision  feature  like  the  rest.  No  statistical 
decision  is  involved.  The  (t)  value  represents  the  largest  ground 
region.  The  (f)  value  represents  the  innermost  level  of  embedding 
(highlights  tend  to  be  at  this  level).  And  the  (?)  value  marks  all 
intermediate  region  embedding  levels. 

Using  the  half  chunk  we  can  encode  regions,  strings,  and  vertices 
in  easy  to  traverse  graph  structures,  which  allow  relatively  easy  graph 
matching  (Figure  4.4).  The  modification  or  removal  of  relationships 
during  a  matching  sequence  (such  as  the  disassociation  of  two  adjacent 
regions),  can  be  easily  accomplisned  by  simply  breaking  links. 
Insertion  of  new  surfaces  uncovered  in  the  process  of  successive  viewing 
can  be  easily  incorporated  by  "ripping"  boundaries  in  half  and  linking 
in  the  half  chunks  bounding  the  new  uncovered  surfaces  (Figure  4.5). 
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Certain  types  of  "Interesting"  structures  become  apparent  in  the 
H-C  graphs.  Vertices  turn  out  to  be  circuits  along  X-links.  Regions 
are  formed  by  circuits  along  E-links  (or  by  H-C’s  snaring  the  same 
R-link) .  A  generalized  "interesting"  thing  is  something  which  makes 
chains  and  particularly  closed  chains. 

4.3.  Graph  Hatching  (Depth  and  Motion  Correlation) 

Finding  matches  between  non-identical  graphs  is  in  general  a 
difficult  problem,  certainly  much  more  difficult  than  finding  simple 
isomorphisms.  If  we  do  not  expect  to  find  many  actual  isomorphic 
matches,  then  a  natural  way  of  ordering  all  possible  non-isomorphic 
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matchings  la  via  the  idea  of  coat  [55].  Me  can  invent  some  basic 
atom-atom  and  link-link  coat  functions,  and  can  define  an  overall  natch 
coat  aa  some  function  of  the  individual  component  (atom  or  link)  coats 
for  all  the  bindings  between  the  matched  graphs.  If  we  have  good 
component  coat  criteria,  then  good  graph  matches  will  also  have  low 
cost.  We,  therefore,  need  to  search  the  possible  graph  binding  space 
(this  space  consists  of  all  possible  atom-atom,  and  link-link  bindings 
between  two  input  graphs),  for  minimum  cost  component  binding 
configurations . 

If  we  choose  a  metric  (such  as  Euclidean  distance)  as  the  basic 
element -element  cost  function,  we  may  need  to  evaluate  large  numbers  of 
bindings  to  find  the  optimal  whole  graph  binding.  This  computational 
expense  can  be  significantly  reduced  by  typing  the  individual  atoms  and 
links  and  allowing  comparisons  only  between  equivalent  types.  We  can, 
in  many  cases,  do  this  typing  without  any  real  loss  in  power.  We  would 
not  generally  compare  surfaces  with  vertices,  atoms  of  any  type  with 
links,  or  contours  considered  very  diffused  with  ones  razor  sharp. 

This  brings  us  to  our  particular  match  scheme.  Three  critical 
ideas  have  been  useful  in  patterning  our  algorithm.  First,  we  constrain 
any  final  matching  of  graphs  to  those  in  which  only  one-to-one  atom 
(link)  and  one-to-none  atom  (link)  matches  occur.  Second,  we  match 
primarily  by  "covering",  and  only  secondarily  by  atom  (link)  to  atom 
(link)  metrical  cost.  And  finally,  matching  processes  are  seeded  at  all 
highly  descriptive  unique  matching  points,  and  allowed  to  compete  with 
each  other  in  a  parallel  fashion  for  the  remaining  unbound  atoms  and 
links.  In  good  situations,  most  of  the  individual  matching  processes 
eventually  grow  into  contact  with  their  neighbors,  find  compatibility 
between  the  neighbors  and  themselves,  and  merge  with  the  neighbors.  In 
this  way  large  areas  of  a  graph  structure  are  matched  in  parallel. 


Generally,  we  are  require  six  costs  in  a  graph  aatching  scheme: 

A  is  an  atom  from  graph  Ga 
B  Is  an  atom  from  graph  Gb 
La  is  a  Link  from  Ga 
Lb  is  a  link  from  Gb 

4.3.1.  f(A.B)  Cost  for  matching  A  and  B 

4.3.2.  g( A)  Coat  for  matching  A  to  null 

4.3.3.  u(B)  Cost  for  matching  B  to  null 

4.3.4.  h(La.Lb)  Cost  for  mate  ling  La  to  Lb 

4.3.5.  klLaJ  Cost  for  matching  La  to  null 

4.3.0.  v(Lb)  Cost  for  aatching  Lb  to  null 

For  the  cost  of  an  entire  match,  we  select  a  cost  (1-6)  for  each  atom  in 
Ga  (for  all  A)  and  Gb  (for  all  B).  We  sum  the  selected  costs.  We  wish 
to  find  the  matches  where  the  cost  is  minimal. 

4.3.7.  C  *  £f( A;  ,B; )  ♦  £g(Aj  )  ♦  £u(B,  ) 

♦  ]jTh(La;  ,Lbj  )  ♦  ^k(La,-  )  ♦  Jj^vtLbj  ) 

In  the  most  general  atom  matching  directed  scheme,  we  will 
necessarily  have  to  check  every  atom  A  (in  graph  Ga)  against  every  atom 
B  (in  graph  Gb).  And  in  the  case  where  the  costs,  particularly  the  atom 
"atch  associated  costs  (1-3),  vary  as  function  of  things  already 
matched,  even  sore  comparisons  may  be  required.  The  link  associated 
costs  (4-6)  will  be  designed  to  grow  primarily  as  a  function  of 
mismatches  which  are  a  function  of  badly  matched  atoms,  rather  than 
mismatches  due  to  atoms  matched  to  null.  This  complicates  the  general 
scheme  by  predicating  the  link  match  cost  on  the  atom  matches.  Finally, 
if  we  have  atoms  of  different  types  and  links,  it  is  not  immediately 
obvious  how  we  should  weight  the  components  of  graph  matching  cost 
associated  with  each. 

We  get  around  most  of  these  problems  by  formulating  a  simple 
"oovering"-based  costing  scheme.  If  we  type  atoms,  we  need  never 
compare  atoms  of  different  type.  They  always  are  known  to  yield 
infinite  cost.  We  may  use  the  array  of  three  state  properties  in  the 
half  chunk  in  the  same  manner  as  a  type  if  we  broaden  the  idea  of  "being 
of  equal  types",  to  the  idea  of  "being  potentially  of  equal  types". 
That  is,  atom  A  covers  atom  B  if  A  could  be  B  and  B  could  be  A.  In  our 
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three  state  scheme  this  kind  of  check  is  easy.  Tne  t  (true)  property 
covers  ?  (maybe)  and  t,  the  f  (false)  property  covers  ?  and  f,  and  the 
?  property  covers  all  (figure  4.6).  It  may  seem  that  simple  three 
state  feature  properties  would  be  very  weak,  and  indeed  individually 
they  are.  However,  the  three  values  are  sufficient  to  answer,  any 
Boolean  question  posed  concerning  a  given  grouping  of  feature  vectors, 
and  if  many  such  questions  are  asked,  a  powerful  discriminator  can  be 
formed  from  the  answers. 


Example : 

t  t  t  f  f  f 
?  ?  t  f  ?  ? 

t  t  t  f  f  f  covers 

t  t  t  f  f  f 
?  c  f  f  c  ? 

t  t  ?  f  ?  f  does  noc  cover 

Covering  Rules 
Figure  4.6 

For  two  half  chunk  atoms  to  be  possible  matches,  they  must  "cover" 
each  other,  be  geometrically  similar  (i.e.  close  enough  in  tangent, 
curvature,  and  size),  and  must  be  involved  in  encoding  the  same 
topological  relations  (i.e.  must  have  compatible  link  structure  and  be 
either  all  vertex  derived  or  contour  derived).  The  matching  algorithm 
uses  one  graph  as  a  template  for  the  other.  About  each  H-C  in  the 
template  a  neighborhood  (circular  for  motion  matching,  along  the 
direction  of  parallax  for  depth  matching)  is  expanded  and  all  H-C's  from 
tne  other  graph  falling  in  the  neighborhood  are  examined  for  covering, 
topological  consistency  and  metrical  cost.  Any  H-C's  with  unique  (or 
no)  matches  are  marked  (bound  to  tne  unique  match,  or  marked  as  matching 
null).  Each  unique  match  acts  as  a  seed  around  which  a  local 
Interpretation  may  grow  (if  no  unique  matches  are  generated  the  matching 
with  the  lowest  metrical  cost  is  chosen  as  a  seed,  but  this  has  actually 
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never  occurred  except  while  processing  some  test  datasets  -  Figure  4.7). 
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Neighborhood  Expansion,  Seed  Formation 
Figure  4.7 

After  binding  up  all  unique  matches,  some  previously  ambiguous 
matches  (H-C's  matching  more  than  one  in  the  other  graph),  will  now  be 
unambiguous.  This  happens  because  of  link  consistency.  If  atom  A 
matches  uniquely  atom  B,  A  connects  through  an  endpoint  link  to  C,  3 
connects  through  an  endpoint  link  to  0,  and  C  ambiguously  matches  D,  E, 
F,  and  G,  then  after  binding  A  and  B,  clearly  we  should  bind  C  and  D 
(Figure  4.8).  We  can  therefore  Imagine  unique  bindings  causing  new 
unique  bindings  through  any  links.  By  a  relaxation  scheme,  these 
influences  are  propagated  until  the  various  seed  centers  grow  to 
collision . 
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A  Unique  Influencing  Adjacent  Non-unique 
Figure  *.5 

When  two  seed  centers  collide,  if  they  represent  mutually 
consistent  bindings  (i.e.  they  both  cause  the  template  atoms  at  the 
collision  area  to  be  bound  to  the  same  atoms  in  the  other  graph)  seed 
growth  stops,  and  the  matching  is  locally  finished.  If  the  seeds 
disagree  about  particular  bindings,  the  interpretations  with  the  lowest 
metrical  cost  are  adopted.  Thus,  one  seed  may  steal  atoms  and  links 
from  another,  and  may  ultimately  obliterate  the  other.  Because  of  our 
covering-based  method  for  seed  selection,  we  generally  get  good  seeds 
evenly  spread  over  the  image  graphs.  Therefore  very  little  competition 
actually  occurs  between  seeds,  and  rarely  is  a  seed  totally  absorbed  by 
its  neighbors. 

Just  as  the  neighborhoods  expanded  for  motion  matching  differ  from 
those  expanded  for  depth  matching,  the  metrical  cost  functions  differ 
(Figure  4.9).  For  depth  we  allow  small  movement  in  the  parallax  for  no 
cost,  but  require  close  fit  in  tangent,  curvature  and  non-parallax 
variation.  For  motion  we  allow  small  movements  in  X  or  X,  but  constrain 
tangent  (though  not  as  much  as  in  the  depth  case),  and  curvature. 
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Motion  Cost-- 
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Dept ft  and  Motion  Costs 
Figure  4.9 

For  both  cases,  once  a  binding  transform  for  one  H-C  is  computed 
(from  a  unique  binding),  this  transform  is  used  for  computing  cost  for 
connected  H-C's.  It  is  assumed  that  objects  are  relatively  rigid. 

4.4.  Interpreting  Bindings  (Transforms  and  Graph  Merger) 

After  a  completed  matching  sequence  we  have  the  two  original  graphs 
(i.e.  their  atoms  and  links),  and  a  set  of  atomic  bindings  (which  imply 
link  bindings  -  if  A  binds  to  B,  A  is  linked  to  C,  C  binds  to  D,  and  B 
is  linked  to  0,  then  the  link  from  A  to  C  is  bound  to  the  link  from  B  to 
D  -  Figure  4.10).  We  must  now  use  this  binding  information  to  fuse  the 
two  graphs,  forming  new  graph  elements  associated  with  3-space 
coordinates  vectors  (in  the  case  of  depth  matching),  or  velocity  vectors 
(in  the  case  of  motion  matching). 
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Link  Matching  Implied  By  Atom  Matches 
Figure  4. 10 

The  actual  computation  of  a  depth  or  a  velocity  for  single  half 
chunks,  or  clouds  of  half  chunks  in  connected  subgraphs  is  really  quite 
straightforward  matrix  algebra  (on  potentially  overspecified  systems). 
We  have  chosen  to  simplify  our  depth  computations  by  assuming  that  the 
objects  we  see  are  relatively  far  from  the  camera  system,  minimizing 
perspective  distortions.  If  this  is  true,  Z  (depth)  becomes  negatively 
proportional  to  parallax. 
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The  main  advantages  of  this  approach  to  computing  X,  Y,  Z  are 
simplicity,  and  the  inherent  isolation  of  parallax  errors  to  only  the  Z 
component.  X  and  X  measurements  are  relatively  accurate  (good  to  one 
pixel  position),  and  P  (parallax)  measurements  are  good  for  computing 
relative  depth.  However,  in  translating  X,  X,  P  into  absolute  X,  X,  Z 
significant  errors  due  to  P  can  occur.  It  is  better  to  isolate  these 
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purely  in  the  Z  coordinate. 

■  For  action  transform  computation  we  assume  small  overall  motions. 
Therefore  we  ignore  any  overall  scale  changes.  We  compute  three 
different  motion  transforms,  one  assuming  simple  translation,  one 
assuming  translation  and  a  rotation  in  the  plane  of  the  image  only,  and 
one  assuming  translation  and  a  general  rotation  in  3-space.  We  select 
the  most  complicated  transform  which  is  not  singular  (or  near  singular) . 
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As  has  been  shown,  computation  of  transforms  of  various  Kinds  is 
easy  following  the  graph  aatch.  What  is  actually  auch  harder  is  to 
produce  an  new  graph  (3-diaensional,  or  with  moving  half  chunks)  which 
is  consistent  and  preserves  as  auch  of  the  associations  of  the  original 
input  graphs  as  possible.  Producing  this  aerged  graph  is  a  heuristic 
process,  however  soae  guiding  principles  are  apparent. 

If  the  basic  signal  data  encoded  in  our  graphs  is  good  (i.e. 
accurately  reflects  object  structure),  we  expect  nearly  perfect  aatching 
between  individual  scene  graphs,  where  the  underlying  object  structure 
is  eoaoon.  In  fact,  we  get  this  criterion  as  long  as  we  correct  for  one 
problea.  We  need  to  allow  for  boundary  expansion  or  contraction  due  to 
slight  changes  in  object  viewing  perspective.  We  deal  with  the  problea 
by  allowing  saall,  unbound  half  chunks  surrounded  by  larger,  bound  half 
chunks  to  be  absorbed  into  their  bound  neighbors  (Figure  4.11). 


Boundary  Contraction 
Figure  4.11 

The  basic  strategy  for  graph  aerging  is  based  on  believing 
atoa-atom  bindings,  and  breaking  in-graph  links  until  each  graph  aay  be 
unambiguously  overlayed  over  the  other.  This  basically  means  that  atoms 
in  one  graph  may  have  links  (in  one  of  the  five  possible  terminals) 
where  atoms  in  the  other  do  not,  or  corresponding  atoms  may  both  be 
missing  corresponding  links,  but  if  both  have  links  they  must  be 
compatible.  That  is,  if  endpoint  A  links  to  3,  and  endpoint  C  links  to 
D,  and  A  binds  across  graphs  to  C,  then  3  must  bind  to  D.  If  this  is 
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not  trua  then  one  or  ooth  of  the  in-graph  links  (A  to  B,  or  C  to  D)  must 
be  broken  (Figure  4.12).  The  link  which  is  broken  is  the  one  which  can 
be  explained  aost  easily  as  an  artifact  of  newly  uncovered  data,  seen  in 
one  image  (an  therefore  its  graph),  but  not  seen  in  the  other  (Figure 
4.13). 


Legal  and  Illegal  Bindings 
Figure  4.12 
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Links  Broken  Because  of  Inconsistency 
Figure  4.13 

Figure  4.14  shows  three  images  processed  up  to  the  basic  V-S-S 
graph  form  (A.  right  t=0  image,  left  t=0  image,  right  t=1  image;  B. 
Edges;  C.  Image  constructed  from  the  region  assignment  data;  D. 
Tracing  constructed  from  the  basic  V-S-S  graphs).  Figure  4.15  shows  the 
basic  V-S-S  form  broken  in  an  object  segment  directed  way  (by  segment 
curvatures  -  Raster's  method),  converted  to  half  chunks  and  correlated 
for  depth  and  motion  (bindings  are  marked  in  yellow).  Figure  4.16  shows 
the  V-S-3  form  broken  into  smaller  pieces,  converted  to  half  chunks,  and 
correlated.  Figure  4.17  shows  the  same  results  (in  less  detail)  for  an 
interesting  metal  cutout  (the  3-d  model  is  rotated  in  three  dimensions 
for  display).  Figures  4.18  and  4.19  show  other  scenes  processed  for 
object  models  (only  one  '  view  and  one  time  interval  is  displayed  - 
objects  found  will  be  displayed  in  the  next  chapter). 


Generated  From  V-S-S  Graphs 

Processing  Prior  to  Matching  for  Depth 
and  Motion 
Figure  4.14 
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4.5.  Conclusions 

We  described  a  method  for  correlating  visual  Information  at  the 
level  of  object  structure,  via  a  graph-based  formalism,  rhis  method  is 
very  good  for  wide  angle  depth  correlations,  motion  correlations  and 
correlations  where  the  transformation  talcing  one  graph  to  another  is 
ill-defined.  This  method  is  powerful  when  used  in  conjunction  with 
graphs  made  up  of  individually  powerful  (in  terms  of  discrimination 
power)  atomic  pieces  of  object  structure.  We  have  introduced  such  an 
atomic  unit  of  structure  for  the  world  of  man-made  artifacts  (or  objects 
primarily  recognized  by  their  shape),  in  the  half  chunk.  We  should  note 
that  the  idea  of  the  half  chunk  does  not  depend  on  graph  matching,  nor 
graph  matching  on  the  half  chunk  (we  could  have  graph  matched  on  the 
V-S-S  graphs,  but  with  more  complexity  due  to  the  different  atomic 
types ,  vertices-strings-surf aces ) . 

Two  major  problems  which  arise  when  employing  our  graph  oriented 
approach,  are  the  problem  of  getting  good  graphs,  and  the  problem  of 
designing  good  atom-atom  correlates.  The  latter  problem  we  have 
addressed  by  characterizing  atom  (H-C)  properties  as  three  state 
decisions  properties  and  using  a  "covering"  based  comparison  scheme. 
The  former  problem  is  unsolved,  and  requires  significant  technical 
effort.  Our  results  in  the  area  are  promising  enough  to  make  us  think 
obtaining  good  graphs  is  possible. 


Chapter  5 


Object  Identification 


5.1.  Introduction 

Object  models  have  traditionally  been  represented  as  either 
semantic  net  like  graphs  [65][62][59],  or  as  object  centered,  rigid 
body,  surface  descriptions.  The  semantic  net  approach  is  deficient 
because  dimensional  information  needed  for  exact  object  reconstruction 
is  not  incorporated.  The  surface  description  form  has  been  used  for  3-d 
hidden  line/surface  graphics,  and  some  recognition  work.  Baumgart's 
system  was  primarily  a  graphics  based  system,  which  was  applied  to 
object  representation.  Baker  tracked  contour  information,  but  still 
produced  solid  models  which  represent  surface  samples.  Perkins  stores 
2-d  models  but  also  uses  an  object  centered  representation.  Burr 
represented  objects  as  object  centered  3-d  segments.  Object  centered 
models  are  good  for  display,  however,  are  deficient  for  recognition 
because  each  new  viewing  angle  generates  models  which  are  parametrized 
in  different  spaces.  To  match  models,  first,  correspondences  must 
heuristically  be  established,  and  at  least  one  model  reparametrized  (so 
both  are  in  a  common  coordinate  system).  This  extra  processing  obscures 
the  matching  process.  It  has  been  our  conclusion  that  object 
representation  and  subsequent  recognition  does  not  match  either  of  the 
two  previously  described  modeling  techniques  well. 
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Several  works  have  suggested  modeling  techniques  which  instantiate 
some  critical _  object  feature  relations.  The  generalized  cylinder 
approach  is  like  this.  Many  objects  can  be  captured  by  specifying 
relations  along  major  axes  [44][37l.  The  trouble  is  that  every  object 
variety  does  not  have  these  major  symmetry  axes. 

Let  us  first  list  the  requirements  of  a  model  representation  scheme 
and  a  method  for  employing  such  a  scheme  for  recognition: 

1)  Tne  modeling  method  should  potentially  be  able  to  capture  object 

structure  exactly. 

2)  The  modeling  method  should  be  able  to  capture  partial  object 

descriptions  (in  the  sense  of  missing  views). 

3)  The  method  should  allow  for  representation  of  objects  at  varying 

levels  of  exactness. 

4)  Models  should  be  built  by  the  system  automatically.  It  should  be 

possible  to  enhance  a  model  with  new  Information  to  form  a  more 
exact  model. 

5)  Models  of  varying  exactness  and  completeness  should  be 

comparable . 

6)  Modeling  should  take  into  account  redundancy  of  structural 

components  in  real  objects  to  form  more  compact 

representations . 

7)  It  should  be  possible  to  recognize  two  very  different  objects  as 

different  with  less  computational  effort  than  to  recognize  two 
similar  objects  as  different. 

8)  Matching  should  not  require  exact  matches. 

9)  The  primitive  elements  and  relations  used  to  encode  objects 

should  explicitly  encode  elements  and  relations  required  in  tne 
recognition  process. 
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10)  Models  should  be  comparable  without  reparaaetrization. 

Tne  semantic  modeling  approach  satisfies  (2)  because  complete 
semantic  graphs  need  not  be  encoded,  (5)  and  (9)  because  presumably 
ideal  models  would  all  contain  critical  relations  and  structure,  and 
(10)  because  critical  relations  and  structure  will  not  change  from  one 
paraaetrization  to  another.  The  solid  model  approach  satisfies  (i) 
because  complete  dimensional  information  is  stored,  (2)  because 
incomplete  information  is  possible,  (3)  because  all  data  can  be  stored 
in  an  averaged  form  (a  "defocused" ,  lower  dimensional  form),  (5)  only  if 
an  averaging  process  exists  which  can  match  models  of  differing  size  or 
dimension  (i.e.  (5)  is  not  satisfied  passively  -  a  computational 
process  is  required),  (9)  if  simple  dimensional  information  is  require 
only  (i.e.  (9)  is  not  satisfied  if  ws  want  complex  topological 
relationships  to  be  represented) .  Briefly,  neither  pure  object  centered 
solid  representation  nor  semantic  representation  has  all  the  features  we 
desire  for  an  object  modeling  scheme,  although  the  solid  modeling 
approach  seems  to  be  better.  The  generalized  cylinder  approach  is 
better  than  either  except  that  not  all  objects  are  amenable  to  it  (i.e. 
for  some  objects  like  blocks  major  axis  encoding  does  not  instantiate 
important  relationships). 

We  encode  objects  in  a  novel  scheme  based  on  the  "half  chunk" 
graphs  produced  by  our  depth-motion-labeling  process  (see  Appendix  D  for 
a  brief  description  of  the  labeling  pass  following  depth-motion 
correlation).  Recall  that  the  basic  half  chunk  (H-C)  is  a  small  section 
of  object  boundary  curvature,  which  carries  an  array  of  properties 
describing  the  nature  of  the  surface-surface  interface  locally. 
Dimensionally,  a  change  in  tangent  angle  and  two  lengths  (lengths  of  the 
legs)  are  carried.  In  the  original  2-d  half  chunk  encoding,  each  H-C 
also  carries  a  tangent  and  the  coordinates  at  the  center  and  end  of  both 
legs.  These  parameters  are  removed  (tangent  angle)  or  store  in  a 
separated  structure  (coordinates  of  the  last  known  position  for  each  H-C 
is  stored  in  the  XYZ  atoms  -  these  each  index  to  a  H-C  atom,  but  are  not 
used  in  our  recognition  algorithm  -  the  XYZ  atoms  are  kept  as  data  for 


low  Level  feature  trackers) 


For  simple  objects  (Figure  5.1),  the  object  H-C  graph  does  not 
contain  enough  information  to  reconstruct  a  dimensional  model  of  an 
object,  however,  when  the  object  complexity  increases  just  a  bit  (Figure 
5.2  and  5.3)»  complete  information  in  the  form  of  constraint  relations 
does  exist.  3y  constraint  relations  we  mean  that  each  H-C  carries  both 
dimensional  constraints  (delta  angle  and  lengths),  and  relational 
constraints  (X-link,  E-link,  and  R-link  bindings).  On  any  reasonable 
3-d  object  these  constraints  when  taken  in  large  enough  subgroup  sizes, 
completely  describe  the  solid  nature  of  an  object.  Me  call  this  form  of 
representation  "feature"  centered  as  opposed  to  "object"  center,  the 
form  commonly  used  for  solid  object  representation.  This  form  of 
modeling  allows  both  2-d  (Figure  5.2),  3-d  (Figure  5.3),  and  mixed 
dimensioned  objects  to  be  represented.  Also,  because  any  surface  or 
boundary  can  be  represented  by  a  concatenation  of  half  chunks,  to  any 
degree  of  precision,  the  requirement  of  exact  representation  can  be 
achieved. 
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A  Half  Chunk  Graph  With  Redundant  Reconstructions 

Figure  5.1 
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A  2-D  Half  Cnunk  Graph  With  A  Single  Reconstruction 

Figure  5.2 
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A  3-D  Half  Cnunk  Grape.  With  A  Single  Reconstruction 

Figure  5.3 
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Our  method  of  encoding  multiple  objects  uses  a  dictionary  of 
universal  half  chunks  which  are  referenced  via  tokens  in  all  object  H-C 
graphs.  An  object  model  consists  of  a  sequence  of  disconnected  2-d 
and/or  3-d  H-C  graphs.  An  arbitrary  number  of  individual  objects  are 
allowed  in  an  object  library.  Library  objects  are  compared  to  input 
objects  (formed  by  the  depth-motion-labeling  correlator  system)  by 
producing  and  comparing  series  of  subgraph  histograms,  beginning  with 
histograms  of  basic  H-C  frequency.  We  find  the  histogram  method  allows 
partial  object  matching,  matching  of  models  composed  at  differing 
precisions,  and  rapid  removal  of  unlikely  objects  from  the  matching  set 
of  objects.  We  believe  that  objects  could  actually  by  modeled  by  a 
series  of  subgraph  histograms  rather  than  by  the  original  H-C  graphs, 
thereby  reducing  recognition  from  a  series  traversal  of  graphs,  to  one 
of  a  series  comparison  of  histograms.  In  this  way,  recognition  of  one 
object  out  of  a  library  of  many  objects  can  be  made  to  be  significantly 
more  computationally  tractable. 

5.2.  A  Common  Basis  for  Object  Graphs  -  Histograms 

If  we  wish  to  compare  graphs  we  can  either  compute  some  graph 
descriptor  function  (such  as  the  number  of  atoms)  for  each  of  several 
graphs  and  compare  the  descriptor  values,  or  we  can  traverse  the  graphs 
in  some  way,  comparing  graph  elements  serially,  accumulating  a  match 
coefficient  or  cost  (as  done  in  depth  and  motion  matching).  If  we  take 
the  traversal  approach,  we  can  either  traverse  the  graphs  in  a  breadth 
first  or  a  depth  first  oriented  way.  Assume  that  we  have  a  library  of  J 
object  graphs  each  averaging  m  elemental  atoms  in  size.  If  we  wish  to 
compare  a  new  object  encoded  as  a  graph  of  n  atoms  to  the  library  of 
objects,  using  the  breadth  or  depth  oriented  search  approach  we  require 
on  the  order  of  J*m#n  compare  operations.  To  use  the  descriptor 
approach  we  need  only  J,m>n  operations  or  approximately  1/n  times  the 
traversal  approaches.  If  the  descriptors  extracted  from  the  object 
graphs  are  invariant  we  only  require  J*n  operations  (i/(m#n)  times  if  J 
dominates,  likely  for  large  object  libraries  -  i/(J#m)  times  if  n 
dominates,  likely  for  small  libraries  of  complex  objects).  Because  we 
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have  been  interested  in  methods  which  could  be  used  with  very  large  data 
bases  we  have  been  attracted  to  comparison  by  descriptor  aethods. 
Particularly  we  want  descriptors  that  are  either  invariant  or  easily 
updated  when  new  information  appears  (i.e.  that  do  not  require 
evaluation  of  all  the  information  pertaining  to  a  given  object  for  each 
acquisition  of  new  data). 

Tne  type  of  graph  descriptor  chosen  is  based  on  critical  subgraph 
histograms.  Because  one  descriptor  may  not  be  powerful  enough  to 
separate  all  pairs  of  graphs,  our  method  is  based  on  comparison  of 
families  of  descriptors,  each  accounting  for  higher  order  subgraphs.  Me 
first  match  each  primitive  half  chunk  in  new  object  graphs  to  the  half 
chunk  component  library.  This  library  contains  copies  of  all  half 
chunks  used  in  any  graph  currently  encoded  in  the  object  data  base.  Any 
half  chunks  not  having  a  match  in  the  component  library  are  entered. 
All  half  chunks  are  thus  replaced  by  H-C  tokens.  Tne  first  order 
histogram  of  an  object  graph  has  one  slot  for  each  H-C  in  the  components 
library.  Each  slot  is  incremented  for  an  occurrence  of  the 
corresponding  H-C  token  in  the  object  graph. 

To  compute  the  second  order  histogram  of  an  object  graph  we  must 
generate  all  subgraphs  with  two  atoms  (or  H-C's),  centered  at  each  atom. 
Me  make  a  sorted  list  of  the  subgraphs  (to  achieve  an  invariant 
ordering),  collapse  all  identical  entries  into  one,  and  use  this  as  a 
component  library.  Me  may  then  compute  a  histogram  using  the 
uncollapsed  list  of  binary  subgraphs  and  the  new  second  order  components 
list.  This  operation  may  be  performed  repetitively  for  higher  order 
subgraphs,  however,  each  level  is  computationally  more  expensive.  For 
arbitrary  graphs  computation  becomes  expensive  quickly,  however,  because 
the  H-C  graph  primitive  has  limits  on  link  fan-out,  it  exibits  more  well 
behaved  growth.  Me  still  do  not  find  it  practical  to  expand  extensively 
beyond  second  or  third  order  histograms.  This  corresponds  to  matching 
on  the  frequency  of  curvatures,  pairs  of  curvatures,  and  trihedral 
vertices.  Beyond  the  order  three  there  is  reason  to  believe  that  a 
traversal  base  technique  might  perform  better  computationally. 


5.3.  Histogram  Comparison 

Comparing  object  graphs  has  been  reduced  to  comparing  lists  of 
histograms.  To  compare  lists  of  histograms  we  need  to  address  the 
problem  of  comparing  single  histograms,  one  from  a  new  object  graph  and 
the  other  from  a  library  of  object  histograms  (or  extracted  from  a 
library  of  object  graphs).  Tne  simplest  comparison  method  would  be  to 
accumulate  a  least-squares  like  difference  cost  between  the  two 
histograms  (Figure  5.4  -  (?„  +£*t+Cc*).  If  this  cost  exceeds  a  limiting 
cost,  the  histograms  are  different,  otherwise  they  are  the  same, 
perturbed  by  measurement  noise.  This  method  is  undesirable  because  any 
new  packet  of  views  of  an  object  is  likely  to  be  incomplete  (unless  we 
explicitly  intended  the  packet  as  a  learning  set  and  therefore  included 
sufficient  information  to  build  an  adequate  model). 

Us  generally  see  only  a  few  sides  or  discriminating  features  of  an 
object  during  a  recognition  set,  this -being  enough  to  separate  a  given 
object  from  our  space  of  all  objects.  Except  during  initial  object 
learning,  we  expect  new  object  packets  to  be  subsets  of  an  object 
already  encoded  into  our  library.  Therefore,  matching  error  is  only 
accumulated  on  histogram  differences  where  new  object  features  are  not 
"covered"  by  old  object  library  entries  (Figure  5.4  -£*„„). 
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Histogram  Cost  Definitions 
Figure  5.4 


Me  will  evaluate  all  three  Histogram  matching  costa  £«*  ,  and 
Cc *  even  though  the  first  alone  is  used  to  determine  object  matching. 
The  other  costs  can  be  used  heuristically  in  reducing  search  when 
libraries  of  objects  get  quite  large.  As  long  as  C •*  is  smaller  than  a 
preset  limiting  value,  any  mismatch  is  assumed  to  be  due  to  information 
present  in  the  new  object  histogram,  not  currently  reflected  in  the 
library  generated  histograms.  This  new  information  is  added  to  the 
library,  refining  the  old  object  information,  therefore  allowing  the 
system  to  "learn”  about  objects.  On  the  other  hand  if  the  mismatch  cost 
is  greater  than  the  limit,  a  new  object  entry  must  be  formed. 

This  points  to  a  very  important  phenomenon  of  "learning  by 
similarity"  systems.  If  the  system  is  allowed  to  learn  too  fast  (i.e. 
the  mismatch  cost  limit  is  too  high),  all  new  data  sets  will  be 
incorporated  into  one  of  a  very  few  object  models.  If  the  system  is 
constrained  to  learn  very  slowly  (i.e.  mismatching  errors  must  be  very 
small  for  equivalence),  new  data  sets  tend  to  form  entirely  new  object 
models  easily.  Me  want  practical  systems  to  operate  between  these  two 
extremes.  Imagine  each  object  packet  generating  a  point  in  an  object 
library  space.  For  practical  systems,  we  require  a  metric  in  this  space 
which  gives  a  low  value  between  points  associated  with  the  same  object, 
and  significantly  larger  values  between  points  associated  with  different 
objects.  As  a  corollary,  we  expect  better  performance  the  larger  the 
average  distance  between  different  object  centers.  Therefore,  even  if 
we  have  a  good  metric,  we  need  to  start  the  system  with  "good”  (i.e. 
far  separated)  seed  object  examples,. 

5.4.  Strategies  for  Constraining  Search 

In  principle,  we  can  form  a  series  of  histograms  from  any  object 
graph,  then  compare  this  series  with  similarly  generated  series  from 
other  graphs  to  establish  matching.  Tnis  can  be  done  between  any  input 
object  graph  and  any  graph  in  an  object  library.  For  large  object  data 
bases  this  is  ridiculous  because  of  computational  constraints.  Me  have 
programmed  into  our  system  several  different  ways  for  artificially 
constraining  the  object  search  (constraining  the  search  without 
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guaranteeing  optimality).  We  may  operate  in  new  object  mode,  object 
verification  mode,  and  constant  computational  effort  mode.  For  all 
operating  modes  first  order  histograms  for  all  objects  in  the  library 
are  precomputed  and  stored.  Each  new  input  object  graph  is  processed 
individually  and  the  library  file  updated  accordingly. 

For  new  object  mode,  the  first  order  histogram  for  the  input  graph 
is  computed.  Tnis  histogram  is  compared  against  all  the  precomputed 
first  order  histograms.  If  it  is  dissimilar  from  all  the  others,  a  new 
object  entry  (and  associated  histogram)  is  formed.  Thus  an  new  object 
is  "learned".  If  only  one  histogram  from  the  library  is  similar,  a 
match  is  declared  and  the  new  object  graph  is  concatenated  to  the  graph 
data  stored  for  the  matched  object  (matched  histogram  is  updated 
accordingly).  If  several  histograms  are  similar,  each  is  expanded  to 
the  second  order,  along  with  the  one  from  the  input  object  graph.  Tne 
matching  of  the  second  order  histograms  proceeds  as  did  the  first. 
Expansion  of  higher  order  histograms  continues  until  either  a  unique 
match  is  found  or  all  potential  matches  are  discarded  (a  new  object 
entry  is  composed).  This  strategy  allows  new  objects  to  be  entered  and 
old  objects  to  be  recognized.  Computational  load  is  roughly 
proportional  to  k*n  where  n  is  the  number  of  objects  in  the  library  and 
<  is  related  to  average  object  complexity.  Most  of  the  work  is  done 
with  the  first  order  histograms,  thus  eliminating  large  portions  of  the 
search  space  after  one  level. 

For  object  verification  mode,  we  disallow  any  new  object  entries. 
Again  the  first  order  histogram  for  the  object  to  be  recognized  is 
computed  and  compared  to  all  first  order  histograms  in  the  library.  If 
we  allow  matching  of  no  objects  (i.e.  this  object  is  allowed  to  match 
none),  the  matching  proceeds  as  in  the  new  object  mode  case,  except  that 
no  new  objects  are  entered  into  the  library  when  all  matches  fail.  If 
we  insist  on  a  match,  and  we  get  to  a  level  of  comparisons  (first  order 
or  higher)  where  no  possible  matches  occur,  we  select  the  lowest  cost 
match  (using  as  the  index  of  cost)  as  a  correct  match.  This 
basically  forces  the  selection  of  an  object.  This  matching  mode  will 
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have  better  performance  in  error  prone  environments,  at  the  expense  of 
disallowing  new  object  model  building.  Computationally,  this  method  of 
operation  is  equivalent  to  the  new  object  mode. 

In  the  constant  computational  effort  mode  matching,  we  constraint 
object  space  searching  artificially  by  examining  matching  costs  (using 
6 ev  and^tw).  The  heuristic  used  does  not  guarantee  optimality,  but 
generally  yields  good  results  for  large  object  libraries.  We  expand  the 
first  order  histogram  for  the  input  object  graph,  and  compare  to  the 
precomputed  histograms.  We  eliminate  all  histograms  not  considered 
similar.  We  treat  the  single  and  null  match  as  previously.  However,  in 
the  case  of  the  multiple  match  we  disallow  all  but  the  m  best  matches  (m 
is  a  specified  constant ,  and  best  to  worst  is  ranked  using  the  Cnn  cost 
•  similarity  1s  still  based  on  thresholding  the  ^»v  cost).  If  after 
selection,  multiple  matches  still  abound,  we  descend  to  second  order 
histogram  based  comparisons.  If  these  yield  multiple  matches,  we 
examine  the  third  order  matches.  The  parameter  m  is  roughly  equivalent 
to  the  depth  of  our  most  intensive  search  in  histogram  orders.  In  this 
way  we  can  artificially  limit  computation  during  object  matching. 
Intuitively,  it  seems  reasonable  that  recognition  can  be  accomplished 
for  most  machine  parts  with  limiting  search  to  three  histogram  orders  - 
curvatures,  paired  curvatures,  and  trihedral  vertices. 

Figures  5.5  and  5.6  show  the  objects  recognized  and  entered  in  our 
test  object  libraries  generated  by  runs  on  the  data  set  shown  in  figures 
4. 14-4. 19.  Note  that  while  some  data  will  be  missing  for  specific 
instances  of  an  object,  the  data  that  is  present  has  an  invariant 
relationship  to  model  data  (with  some  measurement  error). 
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5.5.  Using  Histograms  as  the  Models 

In  the  current  iopleoentation  of  the  object  modeling  system,  we 
have  a  basic  half  chunk  component  library  ordered  by  intrinsic  half 
chunk  value  (sorted  on  all  H-C  properties,  inducing  an  invariant  order). 
Sach  H-C  is  given  a  token  value  at  the  time  of  entry  into  the  component 
library.  All  H-C  graphs  for  objects  are  re-encoded  using  this  token 
value  in  lieu  of  actual  half  chunks.  Prior  to  object  recognition,  the 
first  order  histograms  for  each  object  model  are  computed  from  these  H-C 
graphs,  stored  in  the  object  library.  As  required,  higher  order 
histograms  are  computed  during  object  matching  sequences.  Because  we 
have  found  that  generally  only  limited  levels  of  histogram  orders  really 
need  to  be  traversed  for  most  sets  of  objects,  it  seems  quite  reasonable 
to  eliminate  actual  graph  storage  in  the  future. 

Modeling  objects  as  histograms,  or  as  parameters  wnich  characterize 
(such  as  and  )  histograms  is  quite  appealing,  because  it  allows 
us  to  unify  shape  and  texture  recognition  techniques.  He  have  pointed 
out  elsewhere  (Chapter  3)  that  texture  is  best  modeled  as  an  overdose  of 
shape  information.  Using  histograms  of  edge  and  edge  connector 
densities  and  directions  Marr  has  described  many  elemental  textures. 
Using  our  histogram  techniques  we  can  describe  object  forms.  While 
histogram  segmentation  across  whole  images  many  be  questionable,  most 
elemental  difference-based  edge  detection  schemes  can  be  posed  as  local 
histogram  characterization  and  matching. 

The  shape  histograms  herein  described  may  easily  be  incrementally 
formed  (i.e.  updated  to  reflect  new  information),  if  we  fully  expand 
the  components  library  to  all  possible  H-C  primitives.  This  set  of  low 
level  primitives  then  becomes  our  basic  object  alphabet.  Individual 
objects  become  statistical  "words"  composed  of  these  letters  without 
order  (first  order  histograms),  with  binary  ordering  constraints  (second 
order),  etc.  We  really  never  need  to  encode  graphs  to  encode  object 
structure. 


5.6.  Conclusion 


We  have  described  a  modeling  system  which  is  "feature"  centered 
rather  than  "object"  centered.  Inis  method  has  the  good  features  of 
semantic  net  models  and  dimensional  models  combined.  We  have  discussed 
several  model  matching  methods  which  allow  for  automatic  model  building 
and  model  verification. 

Tne  method  we  propose  is  based  on  objects  modeled  as  2-d  and  3-d 
half  chunk  graphs.  These  graphs  are  relational  networks  of  elemental 
curvature  samples.  The  basic  model  is  free  of  scale,  coordinate  system, 
or  rotation  dependencies  (however,  XT Z  atom  information  may  be 
incorporated  to  bind  the  half  chunks  to  a  specific  three  dimensional 
space).  This  allows  model  comparisons  without  any  reparametriaation. 

To  speed  the  recognition  process,  allow  for  matches  on  partial 
data,  and  organize  search,  we  have  describe  a  histogram-based  object 
graph  matching  scheme.  This  method  may  be  extrapolated  to  actual  object 
modeling  by  histograms. 
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Cnapter  6 

Contributions  And  Future  Wor* 


6.1.  Contributions 

This  work  first  and  foremost  is  an  attempt  at  constructing  a 
complete  visual  understanding  system:  from  initial  measurement  and 
segmentation  to  object  learning  and  recognition. 

We  have  contributed  two  new  edge  detection  schemes  (Chapter  2),  a 
new  region-based  edge  reinforcement  process  (Cnapter  3),  and  a  new 
algorithm  for  boundary  and  vertex  codification  (Chapter  3). 

The  most  significant  contribution  is  the  half  chunk,  half  chunk 
graph,  and  the  histogram-based  graph  comparison  method  (for  object 
library  maintainance) .  The  H-C  graph  is  significant  because  it  points 
the  towards  "feature"  centered  object  models  instead  of  "object" 
centered  models  (Chapter  4).  Tne  histogram-based  modeling  system  is 
significant  (Chapter  5)  because  it  point  towards  a  method  of  processing 
visual  data  that  could  unify  object  recognition,  texture  discrimination, 
edge  detection,  and  region  aggregation  (all  as  histogram 
characterization  processes). 

A  smoothing  method  (Appendix  B),  a  method  for  threshold  detection 
(Appendix  C),  and  a  feature  based  labeling  scheme  (Appendix  D),  have 
also  been  developed. 
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Figures  6.1  and  6.2  show  a  block  diagram  of  our  system  operating  in 
non-textured  black  and  white  environments  and  in  colored-textured 
environments . 
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6.2  future  Mork 

The  greatest  speed  and  performance  improvements  will  come  from  work 
on  the  first  few  processing  steps,  i.e.  edge  measurements  and 
region-based  processing.  Tnese  both  now  work  with  63003  pixels  per 
image  and  this  resolution  will  grow.  We  now  require  approximately  23 
minutes  per  frame  for  these  steps  and  only  5  minutes  per  frame  for  all 
our  graph-based  manipulations  (this  is  even  more  significant  if  you 
consider  that  the  23  minute  phases  are  hand  coded  to  execute  as  fast  as 
possible,  whereas  the  5  minute  phases  are  implemented  quite 
inefficiently).  By  the  time  we  have  reduce  our  processing  to  edges,  we 
have  nominally  2000  points.  After  V-S-3  formation  hundreds  of  things 
remain.  By  the  time  of  object  identification  we  very  rarely  have  more 
than  10  individual  structures  left  to  process. 

Reliability  is  still  limited  most  by  how  well  initial  significant 
differences  can  be  measured.  Cameras  need  better  dynamic  range,  spatial 
resolution,  and  distortion  characteristics  (spatial  and  intensity 
ranges) . 

Me  see  no  good  reason  why  edge  information  cannot  be  directly 
encoded  into  half  chunk  form  without  the  intermediate  stage  of  the  V-S-S 
graph.  Me  also  see  no  reason  why  more  relaxation-based  processing  could 
not  be  incorporated  into  a  single  program,  allowing  richer  interaction 
between  the  high  and  low  levels  of  the  system  (particularly,  with 
respect  to  depth,  evidence  indicates  that  humans  can  use  depth 
information  at  a  lower  level  for  segmentation  aids  prior  to  object 
formation) . 

Me  feel  that  distributed  computing  architectures  would  greatly 
Improve  the  real  time  performance  especially  at  the  low  end  where  help 
is  critical.  Tnere  should  never  be  any  reason  for  a  general  purpose 
serial  machine  to  evaluate  a  function  over  every  pixel  site. 

Me  feel  this  effort  towards  a  bottom-up  based  system  is  a  mixed 
success.  Ms  have  extensively  tested  our  methods  for  edge  and  region 
detection  and  formation.  These  are  quite  successful.  Our  texture 


processing  has  been  done  for  only  a  few  examples  and  runs  too  slowly  for 
extensive  testing.  It  currently  xnows  about  edge  densities  witn  and 
without  directions.  The  graph  matching  methods  for  depth  and  motion 
wor’o  well  for  objects  having  some  sharp  curvature  discontinuities  but 
not  so  well  for  things  with  very  regular  structure,  like  spheres.  For 
these,  a  "half  chunk"-like  structure  for  expressing  centers  of  object 
symmetry  are  a  possible  solution.  Me  would  propose  a  system  which  uses 
both  boundary  generated  half  chunks  and  symmetry  axis  generated  half 
chunks.  Also  things  with  no  regular  dimensional  structure,  like  trees, 
cannot  be  modeled.  Inis  problem  would  probably  best  be  solved  by 
incorporating  a  half  chunk  based  system  within  a  more  global  semantic 
net  based  system  (possibly  coupled  to  a  natural  language  system).  It 
has  been  difficult  to  process  enough  data  for  more  than  about  eight 
objects,  so  the  modeling  system  has  not  been  extensively  tested.  Me, 
therefore,  can  only  speculate  on  performance  with  very  large  object  data 
bases . 
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Appendix  A 


Edge  Detectors 


A.1.  Haeckel  Edge  Detector 

This  edge  detector  is  a  direct  implementation  in  BLISS-10  of  the 
algorithm  in  [23]  Appendix  A,  with  fixes  indicated  in  [24],  The 
operator  is  designed  to  work  on  a  digitized  disk  of  radius  4.5  (diameter 
of  9  cells).  It  allows  detection  of  steps,  dark  stripes,  and/or  light 
stripes.  In  these  experiments,  we  have  restricted  the  algorithm  to 
steps. 

The  operator  attempts  to  explain  all  activity  in  a  disk  in  terms  of 
the  superposition  of  a  set  of  orthogonal  functions.  The  intensity 
function  is  expressed  as  a  series  expansion  in  terms  of  the  orthonormal 
set.  A  least  squares  criterion  between  the  actual  intensity  function 
and  the  orthonormal  expansion  is  used  to  compute,  approximately,  the 
coefficients  for  each  term.  From  this  the  edge  type,  position,  and 
direction  is  computed.  For  complete  details  the  reader  is  referred  to 
the  literature. 

A. 2.  Psuedo-Hueckel  Detector 

Tnis  operator  is  based  in  the  same  sort  of  mathematics  as  is  the 
original  Hueckel  operator.  Tne  intensity  variation  in  a  square 
neighborhood  of  9x9  is  expressed  as  a  superposition  of  orthonormal 


functions.  In  [43],  it  is  shown  that  for  detecting  the  direction  («f) 
of  any  step  in  a  window  the  functions  of  figure  A.1  are  sufficient. 
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Orthonoraial  Set  For  Direction  Computation 
Figure  A. 1 

Our  simple  refinement  to  the  procedure  in  [43],  to  allow  detection 
in  non-binary  pictures,  consists  of  computing  the  minimum  and  the 
maximum  in  the  window,  while  computing  the  components  of  the  gradient. 
We  then  select  a  threshold  of  T=(MAX-MIN)/2+MIN.  [43]  suggests  that 
with  such  a  threshold  T,  we  can  compute  the  edge  line  (i.e.  the  edge 
position),  given  the  direction  ( •(. )  and  the  number  of  cells  greater 
than,  or  less  than  the  threshold.  While  this  is  true,  we  found  that  it 
was  actually  easier  to  implement  the  position  computation  by  fitting  the 
least  squares  line, 
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using  the  points  where  the  intensity  flips  from  one  side  of  the 
threshold  to  the  other  side  of  the  threshold. 
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Tots  modified  operator  is  significantly  faster  than  Hueckel’s  and 
seems  equally  sensitive.  Each  pixel  in  the  square  area  is  accessed  two 
times.  Once  to  compute  MIN,  MAX,  and  gradient  components.  Then  a 
second  time  to  acquire  threshold  crossover  points,  therefore  complete 
the  edge  line  information. 

A. 3.  Linear  Difference  Detector 

This  operator  was  used  by  [7]  in  his  edge  detection  apparatus,  and 
was  for  us  3  jumping  off  point  in  our  edge  detection  experiments.  Our 
difference-based  system  is  really  an  array  of  linear  difference 
detectors  with  a  better  peak  selection  algorithm. 

A  six  cell  strip  (vertical  or  horizontal)  is  used  to  compute  a 
difference  (three  positive,  three  negative).  This  difference  is 
compared  to  its  nearest  neighbors  (Burr  used  the  neighbors  on  both 
sides,  we  use  several  neighbors  on  each  side).  If  it  is  a  relative 
minimum  or  maximum,  an  edge  is  marked  in  its  position.  Burr  went  one 
step  further.  He  examined  the  difference  in  the  orthogonal  direction  to 
compute  in  angle  for  the  edge  (as  in  the  psuedo-Hueckel  operator).  We 
simply  mark  vertical  or  horizontal  edges. 

A. 4.  Rosenfeld  Non-Linear  Edge  Detector 

This  Is  a  direct  implementation  of  the  edge  detection  method 
described  in  [52]  and  (533.  This  system  computes  an  array  of  averaged 
pictures,  four  in  all  for  our  implementation.  The  block  sizes  used  in 
the  averaging  operations  are  2x2,  4x4,  8x8,  and  16x16.  Rosenfeld  also 
suggested  1x1  and  32x32,  however  we  found  these  to  be  unnecessary  (ixi 
is  smaller  than  the  quickest  step  transition  possible  from  our  vldicon 
system,  32x32  is  larger  than  required  to  average  any  noise  pattern  in 
our  tests). 

Then  differences  are  formed  in  the  vertical  and  horizontal 
directions  (Rosenfeld  suggests  diagonals,  but  for  computational 
efficiency  sake,  we  use  only  two  directions).  At  each  averaged  size,  we 
used  maxima  suppression  to  remove  (set  to  zero)  all  points  that  are  not 
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maxima  or  minima  within  one  averaged  block  area.  Additionally,  tne 
differences  not  suppressed  must  bs  greater  than  a  confidence  T. 

Me  now  examine  the  differences  with  respect  to  those  in  other 
averaged  sizes.  Me  select  the  difference  for  which  then  next  smaller 
size  does  not  give  a  significantly  higher  absolute  difference.  If  E(i) 
is  a  difference  for  the  averaged  size  i  x  i  and  E(i/2)  is  the  maximum 
difference  for  the  next  level  smaller  (the  maximum  non-suppressed 
difference  in  the  area  covered  by  the  i  x  i  sized  neighborhood  at  the 
(i/2)  x  (i/2)  sized  level),  then  "significantly  higher"  means: 

A. 4.1.  £(i)  <  aE(i/2)  <  (a»«2 )E(i/(2*»2 ) )  < 

...  <  (a«*n)E(i/(2"»n)) 

but,  £(i/(2»»n))  >  aE(i/(2»»(nf1))) 

where,  a_J/4 

An  edge  is  marked  in  the  position  of  this  selected  difference.  The 
(2B#n)  term  will  never  exceed  8  (n=3),  in  our  implementation.  At  n=3, 
the  selection  process  automatically  selects  the  difference  (i.e. 
(i/(2**(n+l)))  is  never  evaluated  when  n=3). 

For  further  details  we  refer  the  reader  to  [52]  and  [53]. 

A. 5.  Yakimovsky  Edge  Detector 

Me  have  adopted  the  edge  strength  function  proposed  by  [66],  but 
not  the  region  growing  algorithm.  This  is  done  because  we  wish  to 
examine  edge  detection  techniques,  not  region-based  processing. 
(Althougn,  the  region-based  techniques  may  be  interesting  in  comparison 
to  those  described  in  Chapter  3»  There  are  similarities  between  our 
method  of  expanding  edge  neighborhoods  for  string  and  vertex  detection 
and  the  growing  rules  used  by  Yaklmovsky.  By  not  being  constrained  to 
be  one  pass,  our  system  is  capable  of  doing  better  in  some  cases.) 

Me  have  simply  coupled  Yaklmovsky' s  edge  strength  function  to  a 
non-maxima  suppressing  edge  detection  scheme.  Tne  strength  function  is 
given  in  Chapter  2,  for  reference  to  the  more  standard  "expressions  for 
variance  and  mean  comparisons. 


me  suppression  algorithm  used  is  liice  the  one  previously  described 
for  the  linear  difference  detector,  rfe  scan  the  operator  neighborhoods 
(four  suggested  in  [66]  -  see  figure  A. 2)  horizontally  and  vertically, 
marking  edges  at  each  point  where  the  output  is  maximum  within  one 
operator  neighborhood  area.  Tne  only  complication  is  that  we  select  our 
result  from  the  operator  having  the  largest  strength,  and  suppress  the 
others  within  the  saae  area.  YaKimovsiey  seeas  to  be  suggesting  that 
several  differently  shaped  neighborhoods  might  allow  object  shapes  to  be 
followed  sore  effectively,  considering  that  his  technique  requires  a 
relatively  large  nuaber  of  saaples  (to  compute  the  variance  accurately 
enough).  In  fact,  from  a  purely  statistical  point  of  view  it  can  be 
argued  that  even  larger  neighborhood  sizes  might  be  desirable. 


Yalcimovsxy 's  Edge  Detection  Neighborhoods 
For  Horizontal  Edges 
Figure  A. 2 


Appendix  B 


Image  Smoothing  and  Enhancement 


B. 1 .  Image  Averaging 

The  basic  images  taxon  from  the  C.3.L.  silicon  vidicon  imaging 
system  are  252x238  six  bit  pixels  each,  plus  or  minus  one  level.  For 
low  contrast  images,  only  64  levels  is  not  adequate  (especially  when 
performing  gradient  related  operations).  We  have  remedied  this  problem 
by  averaging  eight  252x233x6  bit  images  to  form  a  252x238x9  bit  one. 
When  the  vidicon  signal  is  near  a  quantization  level,  small  noise 
variations  can  cause  plus  or  minus  one  level  digitization  errors  (the 
analog  signal  in  our  system  can  be  modeled  by  the  sum  of  the  "ideal" 
signal  ♦  N{0.0,"0.5)  -  see  Figure  B.i).  If  wa  sum  several  Images 
together,  we  can  effectively  get  new  levels  "between"  the  old  ones  in 
single  images,  thus  a  finer  quantization  of  the  vidicon  signal.  This 
only  worxs  because  there  is  additive  noise  in  the  vidicon  image  signal. 
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VIdicon  Signals,  Quantizations,  and  Averages 
Figure  3. 1 


B.2.  Image  Smoothing 


We  can  further  improve  image  quality  by  smoothing.  This  generally 
consists  of  replacing  every  pixel  by  the  average  of  all  the  pixels  in  a 
given  neighborhood.  While  this  does  improve  "noise"  variations  (by 
replacing  a  random  variable  with  A*  and  ^  by  a  new  random  variable  with 
M  and  ),  it  also  blurs  genuine  signal  variations. 

M 

To  get  around  this  problem,  we  designed  a  non-linear  smoother  based 
on  the  maximum  likelihood  decision  between: 
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This  decision  is  evaluated  about  each  pixel  for  each  of  its 
adjacent  neighbors,  iteratively  for  several  whole  picture  cycles,  rfe 
have  experimented  with  several  methods  for  cutting  off  the  iteration 
process.  The  first,  and  simplest,  is  to  iterate  a  fixed  numoer  of 
times.  For  our  typical  images  we  have  found  5  iterations  to  be 
adequate.  Second,  we  have  allowed  iterations  to  continue  until  the 
overall  6  of  the  image  fails  to  become  smaller,  indicating  signal 
degradation.  This  method,  for  typical  images,  runs  between  3  and  6 
cycles  before  stopping.  The  last  method  used  stops  cycling  when  6  , 
for  g(t)-g(t-l),  fails  to  decrease.  This  method  is  lixe  the  second  one, 
except  that  it  is  less  sensitive  (in  terms  of  number  of  iterations)  to 
random  changes  in  image  content  (i.e.  various  genuine  signal  related 
differences).  Figures  B.2-B.5  show  slices  of  a  six  bit  image,  a  nine 
bit  image,  and  a  nine  bit  image  after  smoothing  has  been  performed  (5 
and  10  cycles). 


g(t-l)  a  image  at  iteration  t-1 
g(t)  =  image  at  iteration  t 
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A  Six  Bit  Image  Slice 
Figure  B.2 
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A  Nine  Bit  Image  Slice 
Figure  B.3 
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Appendix  C 

Selection  of  Thresholds  in  a  Segmentation  Process 


Segmentation  processes  generally  consist  of  clustering  many 
measurements  from  similar  objects  together,  or  conversely  separating  a 
few  highly  dissimilar  measurements  from  a  large  sea  of  similar  ones.  If 
we  have  a  model  for  expected  measurement  error,  we  can  form  a  function 
which  relates  any  particular  measurement  value  to  the  probability  that 
that  measurement  represents  a  noise  variation  or  a  signal  based 
variation.  Me  then  can  select  an  acceptable  threshold  for  separating 
"discontinuous"  phenomenon  from  "continuous"  phenomenon.  (In 
discretized  spaces  each  measurement  may  be  considered  discontinuous  from 
the  next.  However,  if  a  piecewise  continuous  function  was  sampled  to 
form  the  discrete  function,  this  definition  of  continuity  is  not  useful. 
We  wlsn  a  definition  of  discontinuity  such  that  points  at 
discontinuities  in  the  underlying  continuous  space  function  also 
represent  discontinuous  points  in  the  sampled  function.) 

First  we  need  to  define  a  difference  variable.  For  edge 
detection/region  growing  this  is  simply  the  intensity  difference 
computed  from  adjacent  averaged  intensity  neighborhoods  (the  most 
primitive  being  single  pixels).  Possibly,  different  neighborhood 
orientations  (such  as  vertical  vs.  horizontal  differences)  should  be 
considered  drawn  from  different  populations  (therefore  handled  as 


separate  seta  of  data).  For  significant  angle  difference  detection 
(corner  detection),  angle  differencea  can  be  used.  For  depth 
correlation  errora,  depth  differencea  from  adjacent  connect  (in  one  or 
the  other  atereo  image  pair a)  depth  aaaociated  neighborhooda  define  a 
difference  variable.  A  good  difference  variable  ia  a  zero  mean  random 
variable,  for  all  measurements  taken  from  a  similar  aet  of  data. 

If  we  assume  many  more  aimilar  readlnga  (i.e.  many  more  difference 
altea  which  are  aaaociated  with  no  edge,  than  difference  aitea  which  are 
locationa  of  edgea),  than  diaaioilar  readlnga  we  may  eatimate  the  noiae 
diatribution  functiona  uaing  an  one  of  aeveral  atandard  techniquea  (by 
aaauming  all  the  readlnga  repreaent  noiae  variation  -  the  number  of 
noiae  readlnga  muat  be  >>  than  the  number  of  algnal  readlnga).  For 
almple  diacrete  cases  we  can  use  either  a  binomial  model  (good  because 
there  may  be  a  position  in  the  distribution  where  no  noiae  will  be 
detected  as  signal)  or  a  normal  model  (good  because  it  can  be 
characterized  easily  by  two  parameters  the  mean  f4  and  the  variance 
6  ).  For  more  complex  diatribution  types  we  may  use  a  Jtonte  Carlo 
method. 

Armed  with  a  typical  noiae  distribution  we  may  select  a  threshold 
which  labels  a  particular  measurement  as  a  noise  point  with  a  known 
error  probability  p  (Figure  C.i).  We  need  not  threshold  negative 
differencea  at  the  same  significance  level  as  positive  ones,  but  we 
generally  would  expect  to  do  so.  This  method  for  threshold  selection  la 
related  to  several  goodness-of-flt  methods  commonly  described  in  the 
literature  (30]. 
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Estimates  of  the  Distribution  of 
Single  Adjacent  Pixel  Differences 
(An  Example  Difference  Variable) 
Figure  C.l 


As  a  final  object  segmentation  ‘technique,  we  resort,  to  a 
"Waltz-like"  labeling  scheme  for  assigning  types  to  regions  and  half 
chunks.  Me  are  not  interested  in  the  intimate  descriptions  of  line  type 
that  Waltz  himself  obtained,  but  rather  want  to  separate  the  half  chunks 
into  three  major  types.  These  are  lighting  derived,  true  object 
derived,  and  occlusion  derived.  The  first  and  the  third  types  are 
grouped  and  marked  so  as  to  be  ignored  during  the  object  recognition 
phase  of  processing.  Tne  meaning  of  lighting  derived  types  is 
self-evident  (we  mean  highlight  or  shadow  related  half  chunks,  when  the 
half  chunk  does  not  also  coincide  with  an  object  related  boundary). 
Occlusion  derived  types  (UNLINKED-  types)  are  generated  at  what  Waltz 
called  "occluded"  or  "occluding"  boundaries  (Figure  D.l).  Regions  or 
surfaces  are  labeled  as  shadow,  highlight,  ground,  or  object.  Ground 
represents  the  largest  background  region  (or  group  of  regions). 
Generally  this  is  the  tabletop,  or  backdrop. 


V  \  y  i  V 


Figure  D. 1. 
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Some  Examples  of  waltz  Labelings  and 
H-C  Labelings 
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We  begin  our  labeling  algorithm  on  a  H-C  graph  naving  each  H-C 
assigned  every  possible  label  type.  Iteratively  we  apply  vertex-based 
constraints,  then  surface  based  constraints.  All  half  chunks  connect  in 
X-link  rings  are  processed  Jointly  as  vertices.  All  half  chunks  connect 
via  R-links  (alternatively,  via  E-link  rings)  are  processed  Jointly  as 
surfaces. 

Each  vertex  label  type  applicable  to  a  leg  of  a  H-C  is  an  "on"  bit 
in  the  H-C  type  field  for  that  leg  (initially  all  are  on),  We  scan 
through  our  vertex  list  for  all  matching  vertices  (same  geometrical 
class,  degree,  and  having  compatible  labelings),  and  accumulate  the 
logical  OR  of  these  labelings  (equivalent  to  cons' ing  a  list  of  all 
possible  vertex  matches).  Possible  labelings  are  ANDed  with  the  initial 
H-C  labelings.  In  the  first  iteration,  this  operation  eliminates 
labelings  Incompatible  with  certain  boundary  feature  properties. 

Each  E-link  (connection  along  the  boundary)  is  examined  for 
adjacent  H-C  type  compatibility.  This  causes  the  labelings  on  either 
side  of  the  linkage  to  be  the  logical  AND  of  the  labelings  prior  to  this 
processing.  Tnis  operation  is  equivalent  to  computing  the  intersection 
of  the  the  two  label  sets. 

After  the  vertex-based  processing  (processing  equivalent  to  Waltz's 
"filtering"  algorithm),  surface-based  constraints  are  applied.  Each 
surface  aggregation  is  labeled  as  shadow,  highlight,  ground,  or  object, 
by  matching  indicator  label  types  on  all  the  H-C's  encoding  the  surface. 
Then  the  labels  which  are  incompatible  with  the  region  type  are  removed 
from  each  H-C. 

These  operations  continue  until  a  specified  number  of  iterations 
are  concluded,  or  until  steady  state  is  reached  (generally  the  later). 
Tnls  labeling  scheme  is  unique  for  three  reasons.  First,  it  is  based  on 
half  chunks  which  are  themselves  unique.  Second,  it  Incorporates  region 
constraints  as  well  as  boundary  constraints.  Finally,  initial  labelings 
are  assigned  and  constrained  by  directly  measured  feature  properties  in 
addition  to  semantic  types  (the  region  and  boundary  labelings).  In  some 
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sense  the  vertex  geometrical  type  could  be  considered  as  one  o f  these 
measured  feature  properties  (convex,  concave,  straight). 

0.2.  Nomenclature 


Soundary  Segment  (H-C  Legs)  Mnemonics: 


LK 

Linked 

UP 

Unlinked* 

UM 

Unlinked- 

HP 

Pure  Highlight* 

HM 

Pure  Highlight- 

SP 

Pure  Shadow* 

SM 

Pure  Shadow- 

GR 

Unlinked-,  Ground 

US 

Unlinked-,  Shadow* 

UH 

Unlinked*,  Highlight* 

MH 

Unlinked-,  Highlight* 

MS 

Unlinked*,  Shadow* 

L3 

Linked,  Shadow* 

LH 

Linked,  Highlight* 

Feature  Property  Mnemonics: 


DIFF  Boundary  Diffusion 

LGTA  Absolute  Lightness 

LGTR  Relative  Lightness 

TANG  Tangency  Difference 

RAT  Ratio  Variance 

VARY  Side  Variance 

REFL  Reflective  Variance 

TEX  Roughness 

REAL  Real/Subjective  Index 

DEEP  Dep.h  Variance 

MOVE  Motion  Variance 

EMBD  Region  Embeddedness  Index 


Feature  Property  State  Suffix: 

#  Set  to  FALSE  (no  variation,  not  diffused, 

lightest,  light>dar<,  most  embedded) 

&  Set  to  MAYBE  (not  determinable, 

object  level  embeddedness) 

$  Set  not-appliesble-property 

<none>  Set  TRUE  (variation,  diffused,  not  lightest, 

dark>light,  ground  level  embedding) 

Half  Chunk  Angle  Properties: 


C 

V 

Z 

? 


Concaved  (>,160  deg 
Convex  (<  180  deg.T 
Zero,  Straight  (»  l 
C,  V,  or  Z 


.) 

30  deg.) 


Region  Label  Format: 

(  Indicator  Label  Concatenation  , 

Compatibility  Label  Concatenation  )  [comments] 
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Vertex  Label  Format: 

(  (  H-C  List  0  )  (  H-C  List  1  )  ...  (  H-C  List  n)  )  [comments] 
H-C  List  Format: 

(  H-C  Angle  Property  ,  (  Leg  List  0  )  ,  (  Leg  List  1 )  ) 

Leg  List  Format: 

(  Feature  Property  Concatenation  , 

Boundary  segment  labels  Concatenation  ) 

■•Concatenations  are  mnemonics  and  suffixes  separated  by  (OR) 
or  (NOT(AND)  -  legal  in  label  concatenations  only) 


D.3.  Example  Label  Definition  File 


[INITIAL  REGIONS  -  2- JAN-79] 


(LK+UP , LK+UP+UM+SM+HM)  [OBJECT] 
(GR,GR) . .  .  [GROUND] 


HP+UH+LH , LH+HP+UH+MH )  [PURE  HIGHLIGHT] 
SP+tfS*LS,M3+U3>3P+L3)  [PURE  SHADOW] 

»  ) 


[INITIAL  VERTICES  -  2-JAN-79] 

[2-DEG] 

(  (?,  (D£EP#-*-MOV£#*DIFF#+EMBD#,LK) ,  (D3EP#+M0V£#4-DIFF#>EMBD#,LK) ) 

(?, (D££P#+M0V£#+DIFF#+£M8D#,LK) , (DEEP#+MOV£#+DIFF#+EMBD#,LK) )  ) 


(  (C,  (DSEP#-KM0VS#+DIFF#+ErtBD#+LGTR#,LK),(D3£P#*M0VE#+DIFF#+ErtBD#<. 

LGTR#, LK) ) 

(V, (DSEP#+MOVE#+DIFF#+EMBD#+LGrA#+LGTR,LH) , (DEEP#+MOV£#+DIFF#+EMBD#+ 
LGTA#+-LGTR,LH))  ) 


(  (Z,  (DEEP #+M0 VE  #+DIFF#+EMBD#*LGTR # ,  LK ) ,  (DEEP#+MOVE#-fDIFF#+E.«lBD#+ 

LGTR#, LK) ) 

( Z,  ( DEEP #+tf 0 VB #+DI FF #+Sf1BD  #+LG T A #+LGTR ,  LH ) ,  ( D£SP#+MOV£#+DIFF#+SMBD#+ 
LGTA#>LGTR , LH ) )  ) 


(  (? ,  (DE£P#+MOVE#+DIFF#'*-SMBD#1  LS ) ,  (D£EP#+MOV£#-*-DIFF#+-ErtBD#,LS) ) 

( ? ,  (  DSSP#+MOVE#+DIFF #+EMBD# ,  LK ) ,  ( DSEP#-*.MOVE#+DIFF#+ErtB D# ,  LK ) )  ) 


(  (? ,  (DSEP#+MOVE#+DIFF#+SrtBD#,LS)  , (DEEP#-*-MQV£#+DIFF#*EMBD#,L3 ) ) 
( ? ,  (DSEP#+MOVE#+DIFF#-t’EiHBD# ,  L3 ) ,  (  DSEP#+MOV£#+DIFF#+SMBD# , Li 


))  ) 


(  (Z,(D5EP#+M0VE#+DIFF#+EMBDlM.GrA#+LGrRfLH),(DSEP#+M0V£#+DIFF#+ErtBD#+ 
LGTA#+LGTR,LH)) 

(Z,(DSEP#>M0VE#-t-DIFF#f£MBD#-i.LGrR#,LS)l(DSEP#+M0Vfi#+DIFF#*ErtBD#4. 
LGTR#,L3))  ) 

(  ( V ,  ( DSEP#+MOV£#-fDIFF  #+EMBD#+LGTA#+LGTR ,  LH ) ,  ( DE£P#+M0VE#+DIFF#+Ef1BD#+ 
LGTA#+LGTR , LH ) ) 

(C,(DSEP#^M0VE#-t.DIFF#^EM3D#*LGTR#,L3),(DSEP#-KM0VE#+DIFF#4>EM3D#4. 
LGTR#,LS) )  ) 

(  ( ? .  (DEEP#+MOVE#+DIFF#*£MBD# ,  UP ) ,  ( DSEP#+M0VE#+DIFF#+SM8D#, UP ) ) 

( ?, (EMBD, GR ) , (EM8D.GR ))  ) 

(  (? ,  (DE£P#+M0V£#+DIFF#«-EM8D#,MS) ,  (DSSP#*MQV£#+DIFF#4-SMBD#,MS) ) 

(? , (Erf3D,GR ) , (EMBD,GR) )  ) 

(  (Z, (DSEPP+MOVEf+DIFFf+SMBDfr+LGrAt+LGrR ,UH) , (DSEP#*MOV£#+DIFF#+SMBD#* 
LGTA#*LGTR , UH  ) ) 

(Z,  (£MBD*LGTR#,GR ) , (EMBD+LGTR#tG3 ) )  ) 
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(  (V ,  (DEEP#+MOVE#+DIFF#+EMBD#+LGTA#*LGrR,UH) ,  (D2EP#+M0VE#+0IFF#«-Ei4BD#+ 
LGTAI+LGTR ,  UH ) ) 

(C,(EMBD+LGm,GA),(EMBD+LGTR#,GR))  ) 

(  (? ,  (DS£P#+MOVS#+DIFF#+£riBD> ,  UP ) ,  (  DEEPO+MOVEA+OIFFI+EMBD#  ,  UP ) ) 

(?,  (DIFF#+£MBD#,Ut4) ,  (DIFF#+EM8D#t  IM) )  ) 

(  (?,(D3BP#4M0V5#*DIFF»+B<3D#+LGrR#lUP)l(D3£P»«M0V2#+DIPF»+&lBi>»+ 
LGTRtf.UP)) 

(?,  (DIFF#+£M8D0«-L3rA#+LGrR  ,MH) ,  (DIFF#+£HBDA*L3rA#-*-L.GrR  ,Mrf) )  ) 

(  (?, (D2SP#+M0V£#+DIFF#+2MBD#,MS ) ,  (DE£P#-K40V£iM)IFF#+£rtBD#,<43) ) 

(? ,  (DIFF#+£HBD#,U3)  ,  (DIFF#+£tHBD#,U3) )  ) 

(  (? .  (D££P#-*-MOV£#+DIFF#-f£MBO#,  UP ) ,  (DEEP#-*-M0VE?+DIFF#+Ei4BDA, UP ) ) 

{ ? ,  (OIFF#+EriBD# , U3 ) ,  ( DIFfKeMB 0# , U3 ) )  ) 

(  (?1(DEEP#+MOV£#+DIFF#*£M3D#.MS).(DEEP#«.MOV£#*DIFF*+E<4BD#tMS)) 
0,(DIFF#>£MBD#,UM),(DIFr  kEMBW.UH) )  ) 

(  (?,(DSSP#+MOV£#+DIFF**2rfBD#+LGTR#,MS),(DS£P#*rtOV£#*DIFF#*ErfBD#i. 
LGTR#,MS) ) 

(?,  (DIFF#-f£MBD#+LGTA#>L.GrR ,MH) , (DIFF#+£MBD#+LGrA#-*-LGrR ,MH) )  ) 

(  (Z,(D£SP#+M0V£#+DIFF#+£MB0#+LGTA#+LGra,UH),(DSEP#+;40VE#+DIFF#4-£.4BD#+ 
LGTA#+LGrR,UH)) 

(Z,  (DIFFA+£MBD#+LGXR*,Urt),(DIFF#+SrtBDfi-LGTR#fUM))  ) 

(  (V ,  (DSEP#+MOVEtf>DIFF#+EMBDA+LGrA#+LGrR , UH ) ,  (DE£P#+M0V£A-*-DIFF#+2i48D#+ 
LGrA#+LGrR , UH ) )  . 

(C,  (DIFF #+-£MBD#+LGrR#,  UM) ,  (DIFF#+EMBD#-t-LGTR#,  UM) )  ) 

(  (Z,  (DSEP0+MOV£#+DIFF$+£MBD#'*-LGTA#,UH ) ,  (D££P#+MOV£A«-DIFF#+£i4BD#-*- 
L.GTA#,  UH) ) 

(Z,(DIFF#+El4BDM,GTA#,MH),(DIFF#+EMBD#+LGrA#,MH))  ) 

(  (V ,  (DEEP#+-MOV£#+DIFF#-*-EMBD#+l.GTAAf  UH) ,  (DE£P#+MOVEA+DIFF#+Ei4BD#+ 

LGTA#, UH ) ) 

(C,  (DIFF#>£MBDA+LGrA#,MH) ,  (DlFFtf+Ei4BDA+LGrA#,rtH) )  ) 

(  (Z,(D£EP#-vMOV£#+DIFF#+EMBD#+LGrA>+LGrR,UH),(DE£P#-*.MOV£#>DIFFI+£r4BD#+ 
UGT A #-»-L.Grfl  ,UH) ) 

(Z,( DIFF #+SMBdKlGIR#, US ),( DIFF #+EMBD*+LGrR*,U3))  ) 

(  (V,(DSEP#+MOVE#+DIFF#-t-EMBD#-t-LGrA#>LGrR,UH),(DSSP#^MOVS#>iLiIFF#-*-SrtBD#-f 
LGTAA+LGTR , UH ) ) 

(C,  (DIFF#^Ei4BDi»+LGrR#,U3 ) , (DIFF #+EMBD#+-L.GTR# , U3 ) )  ) 

(  (? , (EMBD.GR ) , ( EMBD.GR ) ) 

(?,(SM8D,GR), (EMBD.GR))  ) 

(  (Z.  (DIFF+LGrAA-t-LGrR+SMBDA , HP) ,  (DIFF+LGTAI+LGIR+EMBD#, HP) ) 

( Z, (DIFF+LGTR#, HM) , (DIFF+LGTRA.HM) )  ) 

(  (V,  (DIFF+LGrA^LGrR+EMBD#, HP) ,  (DIFF+L.GrA#4-LGrR+£MBD#,HP) ) 

(c, (DIFF+LGTR#, HM) , (DIFf+LGrR#,HM) )  ) 

(  (? ,  (DIFF-*-LGTA*LGTR#'*-DEEP#'vErtBDf,SP)J,  (DIFF«-L.GTA+LGrR#+D£EP#«-EHBD#,3P) ) 
{ ? ,  ( DIFF+LGTR+DSEP#*EMBD# ,  SM ) ,  ( DIFF+LGI  R+DSEP#«-EMBD# ,  SM) )  ) 

(  (? ,  (DIFF+U3TA+LGTRA-*-D££P#-*-ErtBD#,3P) , (DIFF ♦LGrA+LGTR#-*-0£EP#«-£HBD#,SP) ) 
(?, (EMBD.GR), (EMBD.GR))  ) 


C3-D30] 

[ 7-JUNCTIONS  ] 


(  (?. (EMBD.GR ). (EMBD.GR )) 

?,(em6d,gA),(em6d,gr)) 

(?, (EMBD.GR), (EMBD.GR)) 


[Y- JUNCTIONS] 


(  (V,  (DEEP#+M0VE#4-DIFF#+EMBD#.LK)  ,  (DSEP#+MOVS#+DIFF#+EMBD#.LK) ) 
{V,(D£EP#+MOVE#*DIFF#*£MB&#,LK),(D2£P#*MOV£#-t.DIFF#+EMB6#,LK)) 

( V ,  ( DS£P#+MOV£#+DIFF#+£MBD# ,  LK ) ,  ( DSEP#*MQVE#+DIFF#fEMBD# ,  LK ) )  ) 

(  (Vt(D£EP#>MOV£#+DIFF#+EMBD#,UP),(DE£P#-*-MOV£#-t-DIFF#«-EMBD#,UP)) 

(V, (EMBD.GR). (EMBD.GR)) 

(V, (EMBD.GR), (EMBD.GR))  ) 

(  (V, (EMBD.GR), (EMBD.GR)) 

[v,(DSEP#>M0V£#4.DIFF#-t-EMBDf,UP),(DSSP#-*-M0VS#>DIFF#4-SMBD#,UP)) 

( V ,  (DIFF#+£MBD# ,  US ) ,  ( DIFF+LuT A-t.LGrR#+DSEPf  >EMBDf ,  SP ) )  ) 

(  (V. (EMBD.GR), (EMBD.GR)) 

( V ,  ( DIFF+LGTA+LGTRf+DSEPF+EMBD# ,  SP ) .  ( DIFF#«-EMBD# ,  U3 ) ) 
(V,(DSEP#-hMOVS#-*-DIFF**EMBD#,UP),(D££P#+MOVE#-*-DIFF#4-EMBD#,UP))  ) 

(  (V.(EMBD.GR), (EMBD.GR)) 

(V,(DEEP#*MOV£#>DIFF#-*.SMBD#,UP),(DS£P^MOVE#+DIFF#+EMBD#,LK)) 

( V ,  ( DS£P#*MO V£#+DIFF#+EMBD# , LK ) ,  ( DESP#«-MOV£#+DIFF#«-SMBD# ,  UP ) )  ) 

[W-JUNCTI0N3] 


(  ( C ,  (DIFF#+EMBD# .  UM ) .  ( DIFF#+EMBD# ,  UM ) ) 

( 7 , ( dsep#+mov&#+6ipf#+embd# ,  up } ,  ( deep#** 

( V ,  ( D£EP#+MOV£#+OIFF#+£MBD# , LK ) ,  ( DEEP#*-.' 

(  (C. (EMBD.GR), (EMBD.GR)) 

( V ,  ( DE£P0+MOV£#+DIFF #+EMBD# , UP ) , ( DEEP#** 
( V ,  ( DSEP#+MOV£ #+DIFF#+EMBD» ,  LK ) ,  ( DEEP#-*-.1* 


a£P#4-MO/£#+DIFF#+EMBD#,  LK) ) 
S£P#-kMOVE#*DIFF#«-EMBD# , UP ) ) 


( DSEP#-*-MOV£#-*-DIFF #+EMBDf ,  LK ) ) 
(DE£P#-*-MOVE#-*-DIFF #+-£MBD#,  UP ) )  ) 


[T-JUNCTIONS] 


(  (Z. (EMBD.GR), (EMBD.GR)) 

[  V ,  ( DEEP#+MOVE#+DIFF#+EMBD# ,  UP ) ,  ( DSEP#*MOVE#+DIFF#-.-£MBD# ,  LK ) ) 
(V,  (DSEP#+M0V£#*DIFF#4-£MBD#(  LK) ,  (DSSP#+MOVE#+DIFF#-*-EMBD#,  UP ) ) 


(  (Z. (EMBD.GR). (EMBD.GR)) 


i\u(iwv>  /  f  V  W(  <W  v  |  uu  /  / 

V, (DIFF+LGrA+LGrRf+DEEPI+EMBDf.SP) , (DIFF#*EMBD#,U3 ) 
V ,  (DSEP#^MOV£#>DIFF#t-EMBD# ,  UP ) ,  (DEEP#*M0V£#4.DIFF  #«■& 


MBD# ,UP ) )  ) 


(  (Z.(  EMBD.GR), (EMBD.GR)) 

(V,[DBEP^MOV£#+DlFF#«-SMBD#,U?),(DS£P#*MO\f£f.*-DIFF#-*-SMBD#,  UP)) 

( V, (DIFFI+SMBD#, US ) ,  (DIFF4L<SrA4-LGrR#+D££P#*-EMBD#,SP) )  ) 

(  (Z, (EMBD.GR), (EMBD.GR)) 

V,(DSEP#*M0V£#4.DIFF#*EMBD#,MS).(DIFF+LGrA4.LGrR#«-DEEP#«-EMBD#lSP)) 
(V,(DIFF-*-LGrR-*-DSEP#-*-EMBDI,SM),(DSEP#-*.MOV2#+DIFF#«-£MBD»,UP))  ) 

(  (Z, (EMBD.GR ) ,( EMBD.GR ) ) 

( V, (D££P#+MOV£#+DIFF#+SMBD#,UP ) , (DIFF+LGrR+DEEPI+EMBDf ,SM) ) 
(V,(DIFF>LGTA+LGrR#+DEEP#*£MBD#,SP),(DEEP#4.M0VE^DIFF#^SMBD#,MS))  ) 

(  (Z. (EMBD.GR ) , (EMBD+LQTR#,GR ) ) 

(v,(DE£P#-KMOV£#*DIFF#*£MBD#+LGrA#*LGrR,UH),(DIFF*LGTA#«-LGTRf£M8D#,HP)) 
(V,(DIFF+LarR#,HM)I(DEEP#+MOV£#+DIFF#+EMBD#,UP))  ) 
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(  (Z, (EHBD>LGrR#.GH ) , (EMBD.GR) ) 

(V,(D£EP#^0$£#«-6lFF#*a>lBD#.UP).(DIFF-*.L3rR#JHM)) 
( V ,  ( DIFF+LGrA#+LGrR^EMBD#,  HP ) ,  ( DS£P#*M0VE#«.6lFF# 


TUU  ill  w  g  ***-1  /  / 

:40V S »>6lFF«o£rfBD#*-LGr A#+LGTR ,  UH ) )  ) 


(  (Z.(D2EP**MOVfi#*DIFF#+KMB0#.LK).(DSEP#*MOV£#*DIFF#*£rtaD#*LGra#.LK)) 

(V,TDE£P#fM3V£#i.DlFF#i-£4B6#*LGrA#i.LGra,LH)l(DIFF>LGrA#4-LGrR>24B6#,HP)) 
(V,(DIFF>LGrR#,a4),(DSEP#^M0V£#i.DIFF#i-EMBDi,LK))  ) 

(  (Z,(D££P#>M0V£#i-DIFF#i.£HBD#+LarR#,LK),(DS£P#>M0V£#i.DIFF#*£M3D#tU)) 

( V,  (D££P#M0V£#*DIFF#*’£4BD#,LK ) ,  ( DIfF+LGTR#,  HM) ) 

(V,(DIFF4-LGTA#*LGrRi.£rtBD#,HP),(DS£P#*M0V£#+DIFF#-*-EMBD#4-LGTA#*LGrR,LH))  ) 

(  (Z. (DIFF#*£M8D#,UM)  .(DIFFI+EMBDi+LGrRI.UM) ) 

( V,  (DSEP#-*-MOV£#-*-DlFF#+ErtBQ#+LGrA#'*'LGrR tGH) ,  (DIFF+LGrA#-*-LGrR>ErtBD*,HP) ) 
(V,(DIFF>LGra#,HM),(DSEP#i‘MOV£#i.DIFF#^£MBO#,UP))  ) 


(  (Z.  (DIFF#*£MBD#+LGrR#.UM) ,  (DIFF#*£MBD#. UM) ) 

( V,  (DSEPf+MOVEf+DIF^I+EriBOf.UP  j ,  (DIFF'*LGrR#,HM) 
(V,  (DIFF+LGrA#-*-LGrR>£MBD#,HP) ,  (DE£P#-*-M0V£#+5lFF 


#*EMBD#-*-LGr  A#+LGTR ,  UH ) )  ) 


D£EP#+MOV£#«-DIFF*+£MBD#,LK)) 
0SE?»+H0V£t+DlFF»+£ABDt , UP ) ) 


(  (Z1(DESP#+M0VE#+DIFF#4-^BD#.UP),(D2EP#*M3VE#«-DIFF#«-ErtBD#,UP)) 
{ V, (DIFFf+SMBD#, Urt) .  (QE£P»-^MuyE#4-0IFF #*'£MB0#,LK) ) 

( V , ( DEEP<MWV£#*DIFF#+£rtBD# , LK) , (DIFF#+EMBD# ,  UM ) )  ) 

(  ( Z ,  ( DSEP#>MOVE#>DIFF#>£MBD# ,  UP ) ,  ( DSEP#«-MOVS#«.DIFF#«-EMBD# ,  UP ) ) 
( V , ( EriBD, GR ) , ( ErtBD.GR ) ) 

( V, (DEEP#+M0V£#*DIFF#*Erf8D#, UP) , (DIFF#«-£rfBD#, UH) )  ) 

(  (Z.  (D£SP#+M0V£  J+DIr F#-f£rt8D#,UP  ) ,  (DSEP#+MOVE#i-DIFF#'*-EHBD#,UP  ) ) 
( V ,  ( DIFF #*£MB0# ,  Url) .  ( DSEP#*MOV£#«-DIFF#+ErtBD# , UP ) ) 

( V, (EMBO.GR ) ,  (EMBD.GR ) )  ) 


i  \  w  4  v  i  rTLMiuwr  |  wu  /  ■  \  wuui  « 

,  (EMBD.GR ) , (EMBD.GR ) )  ) 


(  ( Z .  (DSEP#+MOVS#>DIFF#+SMBDf ,  UP ) ,  ( DEEP*  «-MOVE#*-DIFF#>EMBD# ,  UP ) ) 

( V , FeMBD.GR ) , (EMBD.GR ) ) 

(V,(DIFF-*-LGrA+LGra#+D£EP#+SMBD*,SP),(DIFF#«-ErfBD#,U3))  ) 

(  (Z,  (D2EP#-»-M0V£tfi-DIFF#>EMB0#f  UP ) .  (DEEP#*M0VE#«-DIFF#4-£MBD#,UP ) ) 
(Vt7DIFF#^SHBD#,Ui),(DIFF^LGrAi-LarR#+DEEPai.£rtBD#,3P)) 

( V, (EMBD.GR ) , (EMBO.GR ) )  ) 

(  ( Z ,  (DSEP#>MOVE#>DIFF#+EMBD#*’LGrR# ,  UP ) .  ( DEEP#«-MOV£#>DIFF#«-EMBD# ,  UP ) ) 
( v , (DIFF#*SMBD# ,  UM) , (DIFF*LGrR#, HM) ) 

( V ,  (DIFF+LGrA#*LGTR«.SMBD# , HP ) ,  ( DIFF#+SMBD#+LGrA#+LGrR,MH) )  ) 

(  (Z,(DSEP#+MOV£#+DIFF#-i-SMBD#.UP),(DEEP#*MOV£#«.DIFF#*EMBD<M,GrR#,U?)) 
( V,  (DIFF#*£MBD#-*LGrA#*LGTH  ,MH) ,  (DIFF-fLGrA#*LGTR-.-£MBD#,HP) ) 
(V,(DIFFi-LGTR#,HM)f(DIFF#+EMBO#,Urt))  ) 

(  (ZI(D3EP#+M0V£#*DIFF#«-SMBD#,UP).(DSEP#«-M0V£#«.DIFF#*£MBD»,UP)) 

( V, (DIFF#+£MBD#.  UM) ,  (DIFF-*-LGrft*D3EP#+ErfBD#f  SM) ) 

( V ,  ( DIFF*LGTA*LGrR#+'D2EP#+SMBD#  t  SP ) ,  ( DIFF #+EMBD# ,  U3 ) )  ) 


,  ( DIFF #+EM8D# ,  US ) )  ) 


t.  ( D2£P#>M0VS#«-DIFF#+£MBD# .UP ) .  ( DSEPI+MOVElU-DIFFtoEMBD#, UP ) ) 
( V,  (DIFF #♦£.480#,  UM) ,  CDS£PM6v&#«-DlFF#i-£MBD#,  UP ) ) 
(V,(DIFF#*SMBD#,UM) , (DIFF#*EMBD# , UM) )  ) 


SP#*MO V£  #*DI FF#*EM8D# ,  UP ) ) 

UM) ) 

DIFF #*EMBD# ,  UM) )  ) 
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(  ( Z .  ( D££P#+MOVE#-*-DIFF#>£MBD# ,  UP ) ,  ( D££P»+MOV£#+DIFF»+£MBD* ,  UP ) ) 
iV, (EMBD,GR ) ,  (EMBD.GR ) ) 

( V. (EMBD.GR ) , (£MBD,GR ) )  ) 

[4 -DEG] 

[?- JUNCTIONS] 


(  (?, (EMBD.GR) , (EMBD.GR) ) 

(?, (EMBD.GR), (EMBD.GR) 

(?, (EMBD.GR), (EMBD.GR 
( ? , ( EMBD, GR ) , v  EMBD , GR ) 

[POINT-JUNCTIONS] 

(  (V,(D£EP#.*MOVB#*DIFF#«.EMBD#.LK),(DEEP#+MOVS#*DIFF#«.£MBD#.LK)) 

( V , ( DSEP#*MOVE#+DIFF#+EMBO# , LK ) , ( DEEP#+M0VE#+DIFF#+EMB6# , LK ) 
(V,  (DEEP#-fM0V£#+DIFF#+EM8D#,LK) , (DSEP#+MOV£#+DIFF#+EMBD#,LK; 
( V ,  ( DEEP#+MOV£#+DIFF#+£MBD# ,  LK ) ,  ( DE£P#-tMOVE#+DIFFf+EMBD# ,  LK ) 


) 


(  (Z, (DS£P#+MOV£#4-DIFF#+EMBD#,LK), (DEE P#+M0V£#*DIFF#4- EMBD#, LK) ) 
(V,(DEEP#*MOVE#«-DIFF#+EMBD#,LK)  ,(DEEP#-hMOV£#*DIFF#+£MBD#,LK)) 
(V,  (DS£P#+M0V£#4-DIFF#*£M8D#,LK) , ( DSEPI+MOVEf+DIFFf+EMBDf.LK) ) 

( V ,  ( DEEP#*MOV£#+DIFF#+EMBD# ,  LK ) ,  ( DS£P#+MOVfi#+DIFF#«-EMBD# ,  LK ) )  ) 


(  (V. (EMBD.GR), (EMBD.GR)) 


(  (V. (EMBD.GR), (EMBD.GR)) 


#+£MBDf , SP ) ) 
UP))  5 


^--^/f+EMBD#, US ) ) 

l  i  t  v  i/ayr  rmw  fbrTi/xr  it  rTbi'iuvr  y  nw  /  i  V  £  +LGTA^LGrR#4-DE£P#i-EMBDf  ,SP) ) 

(V,(DIFF+LGrR-*-DSEP#*EMBD#,Srt),(DSEP#-*-MOV£#«-DIFF#«-£MBD#,UP))  ) 

(  (V.(DIFF-*-LGrR+DSEP#4.EMBD#,SM),(DIFF#+EMBD#,UM)) 

( V ,  (DSSP#+MOV£#«-DIFFf-*-EMBD# ,  UP ) ,  ( DIFF+LGTR+DSEPP+EMBDf ,  SM) ) 

(V,  (DIFF+LGTA+LGrR#+’DEEP#4'£M3D#, 5P) ,  (DE£P#*M0V£#-*-DIFF#-*-£MBD#,M5) ) 
( V ,  ( DIFF #+EMBD# , US ) , (DIFF+LGrA+LGrR#fDSEP#+£MBD#,SP) )  ) 

[PENTA-JUNCTIONS] 


(C.( EMBD.GR), (EMBD.GR)) 

( V ,  ( DIFF-*-LGTA+LGTR#*D£EP#«.EMBD# ,  SP ) .  ( DIFFf+SMBD# . US ) ) 
(V,(DSEP#^M0VE#*DIFF#+£MBD#,UP),(DE£P#<-M0V£#*DIFF#+£MBD#,LK 


)) 


( V, ( DEE P#+MO VE#+DI FF #+SMBD#, LK) , (DEEP f+MOVE #+DIFF#*£MBD#, U?) )  ) 
(  (C.( EMBD.GR), (EMBD.GR)) 

\  V ,  ( DBEP#-fMOV£#+DIFF#+£MBD# ,  UP ) ,  ( DSEP#+MOVE#+DIFF#«-SMBD# ,  LK ) ) 

...  - - 5DgEP#^M0VlS#>DIFF#4.SMBDi  UP)) 

TR#*DEEP#*EMBD#,SP))  ) 


(V,  (DS£P#-tMOV£#+DIFF#*EMBD#,LK) ,  ( 

[v,(diff#+embd#,us),(diff+l4ta>lg 

(  (C. (DIFF#+EMBD#,UM) , (DIFF+LGTR+DEEPf+EMBDf.SM) ) 

(V,(DIFF+LGrA*LGTR#*DESP#«.EMBD#,3P).(DIFF#«>£M8D#,U3)) 
MDSEP#+MOV£#«.DIFF#*EMBD#,UP),(DS£P#*MOV£#*DIFF#*SMBD#,LK 
(V,(DSEP#*MOVE#*DIFF#+EMBD#,LK),(DESP#*MOVS#«-DIFF#fSMBD#,UP 

(  (C.  (DIFF-fLGTR+D2EP#*EMBD#  ,&M) , (DIFF#+SMBD#, UM) ) 

(V,(D3EP#+M0V£f+DIFF#*SMBD#,UPj((D3EP#+M0V£#«.DIFF#*£MBDf,LK 
(V,(DSEP#+M0V£#>DIFF#*SMBD#.LK),(DS£P#^M0V£#4.DIFF#4.EMBD#,UP)) 
(V,(DIFF#*EMBD#,U3),(DIFF«-L5TAi-LGrR#-»-D2EP#*EMBD#,SP))  ) 


1! 


!) 
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(  (2 

(  (2 


(EMBD.GR) .(EMBD.GR)) 

V, (D£eP#-*MOV£#+DIFF#+£MBD# , UP ) , (DSEP#+MOVfi#+DIFF#+SMBD#,LK ) ) 

V , ( D£EP0*MOVE#+DIFF#+£MBD# , LK  , C DS£P#*MOV£#«.DIFf #«-£MBD# , LK  j 
V,(DSEP#>MOVE#^DIFF#>SMBO#,LK),(DS£P#^MOVE#>DIFF#>£M9D#,UP))  ) 

(DIFF$*£M8D#, UM) , (DIFF#*SMBD#, UM) ) 

V , ( DEEP  iM40 Vt#+DIFF  #+£MBD# , UP ) ,  ( DSEPi^MOVEI+DIFF  #*SMBD# ,  LK ) ) 
V,(DEEP#*MOV£#+DIFF#+£MBD#,LK),(DSSP#.hMOV£#«.DIFF#*£MBD#,LK} 

V,  {DE£P#+MOV£#>DIFF&*£MBD#, LK) , iD2EP#*M0VE#+DIFF#+EMBD#, UP ) )  ) 

(EMBD.GR ) , (EMBD.GR ) ) 

V ,  ( DE£PP*MOVE P+DIFF *«-EMBD# ,  UP ) ,  ( DE£P#+MOV£ #*DIFF#«>£M8D# ,  LK ) ) 

V,  i DEEPt+MOVEP+DIFFO+EMBD#, LK )  ,  ( DEEP  ♦♦MOVE  •♦DIFFivEHBD#,  LK) ) 

V ,  ( DE£P#+M0VE#+DIFF#4-SMBD# , LK) ,  ( DEEPI+MOVEI+DIFFi+SMBDt, UP ) )  ) 

(DIFF#+£MBD#,UM) , (DIFF#+EMBD#,UM) ) 

V ,  (  D£EP#-kMO  V£ #-*-0 1 FF l+SMBO# ,  UP } ,  ( D££P#>MOV£ #*DIFF#*£MBD# ,  LK ) ) 

V ,  ( DS£P#*MOVE#+DIFF#+£MBD# , LK ) ,  ( DS£P»-*.MOV£#*DIFF#+£MBD# ,  LK ) ) 

V,  (DSEP#-kMOVE#*DIFF#+£MBD#,  LK) ,  (DS£P#«MOVE#+DIPF#*EMBD#, UP ) )  ) 


[D2G-5] 


[?-JUNCriONS] 


( 


(?, (EMBD.GR). (EMBD.GR]) 
t?,(£MSD,GR), (SMBD, GR 


?, (EMBD.GR, 

‘‘  lembdJgr 
embd.gr! 


embd.gr 

EMBD.GR 

EMBD.GR] 


) 


[POIMf- JUNCTIONS] 


( 


(V.  (DEEP#-*MOV£#>DIFF#*SMBDf  ,LK),  (D2EP#*M0V£#4-DIFF#«-£MBD#,LK) ) 
(v,(DEEP#-HMOVE#*DIFF**SMa6#,Lfc),(DSEP#*MOV£#«.DIFF#*SMB£>#,LK) 


V ,  ( DEEPP+M0V£#4.DIFF  #+EMBD# 
V,  (DEEPJ+MOVS^OIFF #+£MB0* 

:v,(dsep#-kmove#>diff»>embd# 


(  (Z,(D2EP#>MGV£**DIFF#+EM3D#.LK),(D£SP»*M0VE*«.DIFF#«-EMBD#.LK)) 

( V ,  (  D£EP#-kMOVE#*DIFF#-*-EMB0»  ,  LK ) ,  ( DSEP#-hMOV£<M>IFF#*£MBD#  ,  LK ) ) 

( V ,  (  DSEP#-kMOV£#>DIFF#+EMBD#  ,  LK  ,  ( D2EP#+M0VEf«-DIFF#*EMBDf ,  LK 
N ,  ( DEEP#*MOVE#*DIFF *«-EMBD* ,  LK ) ,  ( DSEP#«-MOV£#+DIFF#+£MBD# ,  LK ) ) 

( V ,  ( DS£P#-kMGVE**DIFF#+EM8D# ,  LK ) ,  ( DSEP*>MGV£#«-DIFF#+£MBD# ,  LK ) )  ) 

(  (V. (EMBD.GR). (EMBD.GR)) 

(V,(DIrF*LGr/ULGm«-DSEP0+EMBD#,SP),(DIFF#+EMBD#,US)) 

(V,  ( DEEP #*MQVEA+D IFF #*£M3D# ,  UP  ) ,  (DSs?0*MQV£#+-DIFF#+EMBD#,LK) ) 

( V ,  ( DSEP#*MGVE#*DIFF #+EMBD# , LS  ) .  ( DIFF+LG  T  A*LGrR#«>D2EP#«-£MBD# .  SP) ) 
(V,(DIFF-fLGTa«-DEEP**EMBD#,SM),(6E£P#*MOVE#+DIFF#t.3MBD#,UP))  ) 


(  (V 


(EMBD.GR). (EMBD.Ga)) 

V,  (DIFF+LGTA+LG TRADES P#+’EMBD#,SP),(DIFF<*+SMBD#, US ) ) 

V ,  (DSEP#'kM0V£#+DIFF#*SMBD#,M3)  ,  (DIFr*LGTA-*>LGTR<M)EEP#+SMBD#,SP)  ) 
V,(DIFF*LGTR+D£EP#-*-SMBD#,SM) , (DSEP#+MOV£#*DIFF#+SMBD#,LK) ) 
V,(DESP#-kMOV£#+DIFF#*£M86^LK),(DS£P#-hMOV£#*DIFF#«-SMB6#,UP))  ) 


(  (V 

(  (V 


(EMBD.GR), (EMBD.GR)) 

V,(D2£P#+MOVE0+DIFF#«.SMBD#,UP),(DE£P#*rtOVE**DIFF#+SMBD#,LK)) 

V,  (DS£P#+MGVE#-*-DIFF#+EMBD#,LK) , (DlFF+LGrR+DEEPP^EMBDf.SM) ] 

V ,  ( DIFF+LG rA-t-LGIR #+D£EP#*SM8D# .  SP ) .  ( DSEP#-hMOVE#+DIFF#+£MBD#  ,MS ) ) 
Y,  (DIFF#fEMBD*,  UB  ) ,  (DIFF+-LGTA+LGrR#+D£EP»>£MBD» ,SP) )  ) 

(EMBD.GR) .(EMBD.GR)) 

V, (DSaP#*MOV£#*DIFF#*EMBD#,UP) , (DIFF*LGrR+D£EPI*SMBD#,SM)] 

V , (DIFF*LGrA*LGrR#+DSEP£*£MBD#, SP) .  (DSEP#+MOVS#*-DIFF#^SMBD#.LS) ) 
V  ( DSEP#+MOVE#*DIFF#*Si-lBD# .  LK ) ,  ( D£i*P#+M0V£#-*-DIFF#*EM8D# ,  UP  ) ) 

V,  (DIFF#>EMBD»,Ui  ) ,  (DIFF+LGrA«-LGrR#*DEEP#*-EMB0#,3P) )  ) 


(  ( V ,  ( DIFF#«-EMBD# ,  UM ) .  ( DIFF+LGT R«-DSEP#«-EMBD# .SM ) ) 

V , ( DIFF+LGTA«-LGm«-DEEP#«-EMBD# , SP ) . ( DIFP#*£MBD# , U3 ) ) 

V,  lD3£P#*MGV£#+DIFF#*EMB0#, UP )  , (DScP#*M0V£#*DIFF#*EM8D#, IK) ) 

V, (DSEP#+M0V£#'*DIFF#*£MBD#,L3 ) , (DIFF+LGTA+LGrRf+DSEPf+EMBDf.SP) ) 
V ,  ( DIFF*LGTR-*-D3EP#«-£MBDI ,  SM ) ,  ( DEEP#*MOVE  #«.DIFF#+EMBD# ,  UP ) )  ) 


;  V , ( DIFF+LGTR+DSEP^EMBDf , 


^EP#>MOV£#>OIFF#+EMBD#, UP ) )  V 


(  (V , (DIFF#+EM8D#,UM) ,(DIFF+LGrR*DSEP#+EMBD#,SM) ) 

( V , ( DIFF+LGT  A«-LGrR#*D£EP#.*-EMBD# , SP ) . ( DIFF#+£MBD# ,  U3 ) ) 

(V,(D3EP#*M0V£#*DIFF#*EMBD#,MS}.rDIFF+lGrA-*43rR#+DEEP#-*-ErtBD#,SP)) 

(V.CDIFF^GTR+DEEP^ErtBDP.SMMOSBPf+MOVEP+DIFF^EMBDP.LK)) 

( V ,  ( DSEP#4-M0 V£#*DIFF#+EMBD# , LK ) ,  ( DSEP#-*-MOVE#*DIFF#*-EMBD# ,  UP ) )  ) 

(  (V. (DIFF+LGTR+D2EP#*EMBD#tSM) ,(DIFF#+£MBD#,UM) ) 

( V , ( DEEP#+MOV£#+DIFF#*SM8D# , UP j , ( DS£P#*MOV£#*DIFF#*£MBD# , LK) ) 
(V,(D2£P#4M0V£#*DIFF#«-£MBD#,LK),(DIFF*LGrRfDEEP#+EMBD#,SM)) 

( V ,  ( DIFF+LG TA+LG TR#fD£EP#«-EMBD# .  SP  j . ( D££P#*MO  VE#«-OIFF#«-EMBD# , MS ) ) 
(V, (DIFF#+SMBD#,U3 ) , ( DIFF+LGTA+LGT R#*D£EP#*EM8D# , SP ) )  ) 


(V , (DIFF+LGrA+LGTRi+DSEPi+EMBDi.SP) , (DS£P#*MOV£#*DIFF#+£MBbf .LSI 
( V ,  (DE£P#+MOVE#+DIFF#«-SMBD# , LK ) ,  ( D££P#-*MOVE#fDIFF#*-SMBD#,  UP ) } 

( V ,  ( DIFF#*£MBD# ,  U3 ) ,  ( DIFF+LurA+LGTR#+D££P#+£MBD# ,  3P ) )  ) 

[PENT A- JUNCTIONS] 

(  (C, ( EMBD.GR ), (EMBD.GR ) ) 

( V ,  ( DIFF*lStA«-LGT  R  #+D£EP#-*-EMBD# ,  SP  > ,  ( DIFF#«-£MBD# ,  U3 ) ) 

(V ,  ( DSEP#*MGV£#+DIFF #-*-£MBD# , UP ) ,  (DS£P#+-MOVE#+DIFr  f^-EMBD#, LK) ) 

( V ,  ( DSEP#+MO V£#+DIFF#+£M8D# ,  LK ) ,  ( D£EP#*MOVE#*DIFF#«-EM8D# ,  LK ) ) 

( V ,  ( DEEP#«-MOVfi#*DIFF#«-SMBDf .  LK ) ,  ( DEEP#«-MOV£#*DIFF#«-£MBD# ,  UP ) )  ) 

(  (C. (EMBD.GR) .(EMBD.GR)) 

V,  (DSEP#-kMOVE#>0IFF#>EMBD# , UP ) ,  ( D£EP#4-MOVE#+DIFF#*SMBD# , LK ) ) 

( V ,  (DEEP#*MOV£#+DlFF#+EMBD#,LK j ,  DSEP#+M0VE#^DIFF#4.£MBD#, LK ) ) 

(V,  (DE£P#«-M0VE#-»-DIFF#+EM8D#,  LK) ,  iDSEP#«MOV£#-*-DIFF#«-SM8D#,UP) ) 

( V , (DIFFI+EMBD# , U3 ) , (DIFF+LGTA+LGTRf+DEEPf+EMBD# , SP ) )  ) 


(C. (DIFF#+EMBD#,UM) , (DIFF+LGTR+DSEPI+EMBDf.SM) ) 

(V,{DIFF>LGTA«-LGrR#*D2EP#>EMBD»,SP).(DIFF#+£MBD#,US) 
{ V,  ( DE£P#*MOV£#*DIFF#*EMBD# ,  UP ) ,  ( DEEP# ♦MOV£#+DIFF#*£; 

fur  OiA  .  AJ.'urc*  «  .  atppi  »  ?uo  r\m  r  v  \  /  r\c*c*  oi »  u.w.?  « »  r\T  c?  a  .  cv 


V,(DSEP#+MOV£#-».DIFF#+EMBD#  LK 
[  V , ( DEEP #+MO V£ f+D I F F #+EMBD# , LK 


j  i  /  •  v  1/  ^  iir  ■’"Mi  *  ■  ww  /  t 

;  DEEPf+MOVEI+DIFr  #«-£MBD#,  LK 
DEEPf+MOVEf+DIFFi+SMBDi, LK 
!  DSEP#+M0VE#4-DIc'?t+EMBD# ,  UP 


M 


(  (C.njIFF+LGrR+DEEP^SMBDf.SMMDIFFifSMBDf.UM)) 
( V,  ( DSEP#+MOVfi#+DIFF#*EMBD* ,  UP ) ,  ( DS£P#«-M0\f£#* 


V ,  ( DSEP#-fMOV£#-*-DIFF#*EMBD# ,  UP ) ,  ( DSEPf+MuVE #*DIFF#+SMBD# ,  LK ) ) 
V,(DEEP#+MOV£#*.DIFF#+EMBD#,LK),  DSSP#+MOV£#*DIFF#*EMBD#,LK  ) 
V,  ( DSEP#*MGVE#+DIFF#-*-EMBD#.LK) ,  (DSSP#*M0V£#-*-DIFF#-*-SM8D#,UP ) ) 
V ,  ( DIFF#*£MBD# ,  U3 ) ,  ( DIFF+LGrA+LGTR#>DSEP#-*-EMBD# ,  SP ) )  ) 


(  (C.( EMBD.GR), (EMBD.GR)) 

V ,  ( D££P#*MOV£#+DIFF#-t-EM8D# ,  UP ) ,  (  DSEP#-kMOV£#*OIFF#«-EMBD#  ,  LK ) ) 
i.V,  (DESP#«-MOVE#+DIFF#*EMBD#,  LK) ,  (DS£P#+MOV£#*DIFF#«-EMBD#,LK) ) 
I.V,(D££P»-kMOV£#4-DIFF#*SMBD»,LK),(DSEP#-kMOV£#«-DIFF#«-EMBD#,LK)) 
(V,  ( D£EP#>M0 VE#+DIFF#+EMBD# , LK ) ,  ( DSEP#<-MOV£*<-DIFF#+£MBD# , UP ) )  ) 

(  (C, ( DIFFI+SMBD#, UM) . (DIFF#*SMBD#t  UM) ) 

( V ,  f  D3EP#*MOV£#*DI  FF  #*EM8D# ,  UP ),( DS£P#+MOV£#«-DIFF#«-EMBD# ,  LK ) ) 

( V ,  C  DEE P#*MQ V£#+DIFF#*EMBD# , LK ) ,  ( DEEP# ♦>MOV£#+DIFF l+EMBD# ,  LK )  j 
i  V,(D££P#*MOV£fi-DIFFf*£MBD#,LK  , (DS£P#>MOV£#*DIFF#*EMBD#,LK) J 
(V,(D£EP#-kMOV£#*DIFF#*SMBD#,LK),(DSEP#*MOV£#*DIFF#«.SMBD#,UP))  ) 

(  (Z. (EMBD.GR ) ,  (EMBD.GR  ) ) 

V ,  ( D3£P#*MOV£#*6lFF#*£MBD# ,  UP ) ,  ( D2EP#*MOV£#«-DIFF#«.EMBD#,  LK ) ) 
(V,  (DS£P#«-MOV£#*DIFF»fSMBD#,LK  ,  D£SPf*MOV£#*DIFF#*EMBDf ,LK) ) 
(V,(D2£P#+M0V£#4.DIFF#^EMBDf,LK),(D££P#>M0V£#4.DIFF#4.£MBD#,LK)) 
(V,(D£SP»4>M0V£#4.DIFF#4>SMBD#,LK),(D2£P*>M0V£#fDIFF#*-£MBD#,JP))  ) 


»,LK)) 
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Appendix  E 


Image  Formats 


E.l.  General  Image  format 

Images  of  any  resolution  and  pixel  sizes  from  1  bit  to  36  bits  may 
be  encoded.  Optional  user  information  may  be  encoded.  Several  such 
option  fields  commonly  used  are  included  in  this  description.  Programs 
are  available  for  digitizing  images  into  252x239x6  bit,  252x238x9  bit, 
and  504x476x6  bit  forms. 


Field (word) 


Description 


0 

1-3 

4 

5 

6 

7 

8 

1? 

11 

12 

13-15 


l6-(l5+m) 


Picture  Length  ( Image+Header) 

UP,  RIGHT,  LEFT  Linics/Markers 
(See  Color  Triples) 

Atom  Type  (Always  =  2) 

Highest  X  Pixel  Coordinate  (X  Dimension 
Minus  1)  -  High  X 

Highest  Y  Pixel  Coordinate  (Y  Dimension 
Minus  1)  -  High  Y 

Number  of  Gray  levels  (Quantization  Steps 
-  for  6  bit  images  *  54  steps)  -  GL 
Minimum  Pixel  Value  -  Min  I 
Maximum  Pixel  Value  -  Max  I 
Average  Pixel  Value  -  Ave  I 
Offset  to  Pixel  Pointer  List 
Offset  to  Picture  Data 
Offsets  to  Interleaved  Picture  Data 
(Only  for  504x476x6  bit  pictures 
directly  from  the  imaging  routines) 
User's  Fields  (m  is  words  of  user's 
fields) 
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Commonly  Computed  Fields 


Pixel  Size  s  (  C(  C(  0(11)  )  )  .and.  007700300DD0  )/l03D0D0D 


User  Fields  Length  s  C(  0(11)  )  -  0(16) 

Offset  to  First  Picture  Word  (Pixel  [High  7,0))  *  C(  0(12)  ) 

Offset  to  Pixel  X,7  *  Offset  to  First  Picture  Word  ♦  C(  0(1 i)  ♦ 

(  X»(7  -  High  7)  .mod.  HS/Pixel  Size)  )  )  ♦ 
(  X«(7  -  High  X)  /  (3S/Pixel  Size)  ) 

n  .and.  a  a  Logical  "AND"  of  n  and  a;  0(n)  =  Offset  n; 
n  .mod.  a  a  Remainder  of  n/m;  C(n)  a  Contents  of  n; 


Common  User  Defined  Fields  Description 


16  Time  (in  Milliseconds)  Since  Last 

laage  Was  Taken  (for  picture  sequences) 

17  Edge  Speeds  (See  Edge  Pictures)  -  E  Speed 

id  Edge  Position  Modifier  Field  size 
(See  Edge  Pictures)  -  £  Pos 


E.2.  Color  Image  Triples 


Color  images  are  stored  as  sequences  of  images  consisting  of  a 
blacic/white  image  followed  by  one  or  more  single  color  images  (red, 
green,  or  blue  spectra All  images  are  in  the  general  image  format. 
Tne  red,  green,  or  blue  images  aay  be  omitted.  Tne  red  image  is 
associated  with  the  UP  link  in  the  black/ white  image.  The  green  image 
is  associated  with  RIGHT,  and  the  blue  with  LEFT.  If  a  link  field  of 
the  black/white  image  is  non-zero,  the  associated  colored  image  is 
available  for  reading. 


£.3.  Image  Format  for  Edges 

The  output  of  all  the  edge  detection  systems  described  in  this 
thesis  are  in  the  form  of  image  triples.  Tne  first  image  is  the  input 
gray  scale  image  (dimensioned  (High  X  «■  1)  by  (High  7  ♦  1)).  The  next 
is  a  7  Edge  image  having  one  pixel  for  each  junction  of  pairs  of  pixels 
in  the  gray  scale  image  lying  on  a  vertical  line.  The  dimension  of  this 
image  is  (High  X  ♦  1)  by  (High  7).  Following  is  the  X  Edge  image, 
having  one  pixel  for  each  Junction  of  pairs  of  pixels  in  the  gray  scale 
image  lying  on  a  horizontal  line.  This  image  is  dimensioned  (High  X)  by 


■5^3 
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(High  Y  ♦  1). 

Each  non-zero  pixel  in  an  edge  picture  represents  a  significant 
intensity  transition  wicn  maximum  tendency  between  the  corresponding 
gray  scale  image  pixels.  The  edge  orientation  is  horizontal  (in  the  Y 
Edge  image)  or  vertical  (in  the  X  Edge  image).  Edge  pixels  carry  edge 
type,  transition  size,  and  fine  positioning  fields.  Edge  type  fields 
require  no  bits  or  one  bit.  Transition  size  field  requires  the  number 
bits  needed  to  encode  the  maximum  edge  speed  (see  Edge  Speed  field  in 
the  general  image  format  section).  Tne  edge  speed  is  an  index  to  the 
edge  mask  that  matched  the  local  intensity  phenomenon.  This  nominally 
requires  numbers  from  1  to  S.  The  basic  position  of  an  edge  is 
specified  by  its  location  in  the  picture  matrices,  however,  for  fine 
tuning  position  plus  or  minus  0.5  pixel  positions  the  edge  position 
modifier  field  is  used  (EE  pos).  The  size  of  this  field  is  variable 
(see  Edge  Position  Modifier  field  in  the  general  image  format  section). 

Edge  Pixel  Schematic 

“je  V 

Edge  Position  Fine  Tuning 
E  pospos  s  2*BE  pos  -  1 

Position  For  X  Edge  At  X,Y  =  (X  (ES  pos  -  E  pospos/2)/E  pospos, I) 
Position  For  Y  Edge  At  X,Y  =  (X,Y  ♦  (EE  pos  -  S  pospos/2)/E  pospos) 

E.4.  Image  Format  for  Regions 

The  data  sets  produced  after  region  aggregation  consist  of  image 
quadruples.  The  first  is  an  Intensity  image,  the  second  is  a  region 
index  image  (same  dimension  as  the  intensity  image),  and  the  last  two 
are  X  and  Y  edge  images  (formatted  as  in  edge  data  sets).  Each  pixel  in 
the  region  image  contains  a  region  marking  number.  Tnis  number  indexes 
into  a  region  list.  Each  region  has  one  region  list  entry,  containing 
the  following  fields: 


If 

■ft M.  ptirh'o*  Mje spttd. 


Field(word) 


Description 


0 

1 

2 

I 

1 

I 

10 

II 


Arts  (pixels) 

X  Center  of  Mess 

Y  Center  of  Hass 
Average  Intensity 

X  for  a  Pixel  in  tne  Region 

Y  for  a  Pixel  in  tne  Region 
Formation  Order 

X  Edge  Density 

Y  Edge  Density 
Perimeter 

Standard  Deviation  of  Intensity 
Perimeter  with  Frame  of  Picture 


Tne  region  list  is  written  as  a  binary  list  formatted  as  follows: 


Words 


Description 


1  High  X  for  Original  Image 

1  High  Y  for  Original  Image 

1  Gray  Levels  for  Original  Image 

1  Time  Original  Image  Was  Digitized 

1  Region  List  Atoms 

1*12  Words  of  Region  List  Atoms 


E.5.  V-S  And  V-S-3  Format 


Vertex-String[-5urface]  ( V  — S C -S ] )  format  graphs  are  written  in 
binary  to  the  filing  system  as  lists  of  atoms  and  links: 


Words  Description 


1  High  X  for  Original  Image 

1  High  Y  for  Original  Image 

1  Gray  Levels  for  Original  Image 

1  Time  (Optional...  In  V-S-3,  not  V-S) 

1  String  Atoms  (m) 

m*l0  Words  of  String  Atoms 

1  Edge  Point  Atoms  (n) 

n*4  Words  of  Edge  Point  Atoms 

1  Vertex  Atoms  (1) 

1*4  Words  of  Vertex  Atoms 

1  Vertex-String  Associations  (1) 

j  Words  of  Vertex-String  Associations 

Optional...  in  V-S-S  format: 

1  Region  Atoms  (i) 

1*12  Words  of  Region  Atoms 

(See  Region  List  Format  for  Each  Entry) 
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String  Atoas: 

Field (word) 


Description 


0 

1 

2 

l 

I 

l 

9 


Edges<l8, 18>,  Real  Edges<0,l8> 

RO  (Ieft)<l8,i8>,  R1  (rightXG,  i8> 

Average  Gray  Level  in  RO 

Average  Gray  Level  in  Ri 

Average  (Gray  Level )**2  in  RO 

Average  (Gray  Level),#2  in  Ri 

Average  (Gray  Level  RO  -  Gray  Level  Ri)**2 

Average  Boundary  Diffusion 

Low  VertexOo,  l8>,  High  Vertex<0,i8> 

Low  Edge  Index<l8, 18>,  High  Edge  Index<0,K 


••<n,m>  notation:  n  is  field  position  (bits  right  of  the  field) 
o  is  field  size  (in  bits) 


Edge  Atoas: 

Field(word) 


Description 


0  Subjeetive/ObJective<35, 1>, 

Transition  Size<28,7>, 

Edge  Index<l4.l4>, 

String  Index<0,14> 

1  X  Position 

2  Y  Position 

3  Tangent  Angle  (Radians) 


Vertex  Ajjpas: 


ield(word) 


Description 


0  X  Position 

1  Y  Position 

2  Position  Uncertainty 

3  Psuedo/Real<35, 1>, 

Vertex-String  Association  Low<io,i7>, 
Vertex-String  Association  High<0,i8> 

Vertex-String  Associations: 

Field(word)  Description 


0 


Vertex  Index<i8. i8>, 
String  Index<0,‘7> 


168 


S.6.  H-C  Format 


Trie  Half  Chunk  Graph  format  la  used  internally  to  the  multi-image 
correlation  system,  and  is  written  to  provide  input  to  the  object 
recognition  system.  When  written,  the  format  is  as  follows: 


Words 


Description 


1 

1 

1 

1 

m*8 

1 

n 

1 

i*7 

1 

i»4 

1 

j 


Object  Atom  Format: 

Field(word) 

o . 


1 

2-4 

5-7 


Time  Elapsed  Since  Last  Packet 

Total  Time  Elapsed 

Frame  Intervals  in  This  Packet 

Oojects  Detected  (Connected  H-C 

Graphs  After  Ooject  Segmentation)  -  m 

Words  of  Object  Atoms 

Humber  of  Regions/Surfaces  -  n 

Words  of  Object-Surface  Associations 

Number  of  H-C  Atoms  -  i 

Words  of  H-C  Atoms 

Number  of  XYZ  Atoms  -  i  (Same  as  H-C's) 
Words  of  XYZ  Atoms 
Number  of  H-C  Associations  -  j*i,4 
Words  of  H-C  Associations/Limcs: 

E-link,  Side  0 
E-link,  Side  1 
X-link,  Side  0 
X-link,  Side  ^ 


Description 


Object  Type  Flags, 

Ob, ect  Is  2-d<35, <>, 

Ob, ect  Type<33,2>, 

Type  Description 
Significant  Ooject 
Insignificant  Object 
Lighting  Effect  Object 
Ground  (Occluded)  Object 
Type<j0,2>. 

Type  Description 
3-d  Translation,  3-d  Rotation 
3-d  Translation,  2-d  Rotation 
3-d  Translation  Only 
No  Transform 
Object  Has  Curved  H-C's<32,i> 

Low  H-C  IndexOS,  1S>, 

High  H-C  Index<0.l8> 

X,  Y,  Z  Translations 

Alpha,  Beta,  Gamma  Rotation  Angles 


Half  Chun*  Atom  Format  (H-C): 
Field(word) 


Description 


169 


V 

5 


T,p.  H*j,Oj>;i8>. 

Curved/Vertex, 

Object /Non-object /Lighting 
Surface  Association^,  18> 

Tangent/Change  in  Tangent 
Length  Side  3 
Length  Side  1 
Properties  Side  0 
Properties  Side  1 

Prop#  Prop  Description 
Intensity  Chops 
Boundary  Diffusion 
Absolute  Lightness  (Darkest, 
Lightest) 

Relative  Lightness  (DarX/Light , 
Light/Darx) 

Tangency  Difference 
Ratio  Variance 
Side  Variance 
Reflective  Variance 
Roughness 

Real/Subjective  Index 
Depth  Variance 
Motion  Variance 
Region  Embeddedness  Index 
Labeling  Side  008. 18>, 

Labeling  Side  K0,i8> 

Linxed, 

Unlinxed*-, 

Unlinxed-. 

Pure  Highlight*-, 

Pure  Highlight-, 

Pure  Shadow*-, 

Pure  Shadow-, 

UnlinXed-  A  Ground, 

UulinXed*-  A  Snadow*-, 

UnlinXed-  A  Snadow*-, 

UnlinXed*-  A  Highlight*-, 

UnlinXed-  A  Highlight*-, 

LinXed  A  Snadow*-, 

LinXed  A  Highlight*- 


18 
1 1 
12 

\i 

15 


Optional...  Not  in  H-C  Format  Written  to  Filing  System: 


l 

9 

X  Center 

I  Center 

X  Side  0 

i5 

Y  Side  0 

1 1 

X  Side  1 

12 

Y  Side  1 
Spare  'Word 

13 
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XYZ  Atom  Format: 

Field(word) 


Description 


0  Curved/Vertex<35, 1>, 

Associated  H-C  Index  Number<i3, 17>, 
2D/3D<17, 1>, 

Frame  Number<0,17>  (Mot  on  Written  XYZ) 

1  X  Coordinate  for  H-C  Center 

2  Y  Coordinate  for  H-C  Center 

3  Z  Coordinate  for  H-C  Center  (0  for  2D) 

Object-Surface  Association: 

Field(word)  Description 


0  Surface/Region  Number08, 18>, 

Object  Graph  Mumber<0,17> 


E.7.  Ooject  Library  Format 

Streams  of  object  packets  are  read  by  the  recognition  program,  one 
isolated  subgraph  at  a  time,  to  be  incorporated  into  the  object  library. 
This  is  done  by  either  matching  an  existing  object  library  entry  or  by 
forming  a  new  one  (if  no  initial  object  library  file  exists,  the  first 
object  matches  nothing  and  is  used  to  begin  a  library).  Library  files 
are  made  up  of  a  components  list,  followed  by  a  stream  of  object  graphs 
(expressed  in  terms  of  references  to  the  atoms  in  the  components  list): 


Words 


Description 


1 


m*2 

1 

n*20 

? 


Object  Graph  Format: 
Words 


Number  of  H-C  Prototypes  in  the 

Components  List  -  m 

Words  of  H-C  Prototype  Atoms 

Number  of  Object  Graphs  -  n 

Words  of  Ooject  Grapn  Directory  Entries 

Object  Grapns,  Each  Start  on  a  New  Block 

Boundary 


Description 


1  Number  of  H-C  Component  Indexes  -  m 

m  Words  of  H-C  Component  Indexes 

m*5  H-C  Links: 

£-link,  Side  0 
E-link,  Side  1 
X-link,  Side  0 
X-link,  Side  1 
R-link 


h-C  Component  Format: 
Field(word) 


Description 


Non-shape  Properties,  Side  008, 8>, 
Non-shape  Properties,  Side  1<27,8>, 

(selected  from  H-C  Properties 
Length  Ratio  Property<0, i8> 

Curvature  Angle  (Change  of  Tangent) 
Property08, io>, 

Index  into  Components  List<0,l8> 


Object  Graph  Directory  Entry  Format: 

Field(word)  Description 


Object  Graph  First  Bloc*<0, i8>, 
Extension  Graph  First  BlocWOS, 18> 
Blocks  in  Graph<0,1o>. 

Time  Entry  Was  MadeOo,i8> 

Pointer  To  first  Order  Histogram 
(In  Memory  0nly)<0,18>, 
Histogram  Offset 

(In  Memory  0nly)<l8, 18>. 

Index  To  First  Object  Graph<0, l8> 

(In  Extension  Graph  Entries), 
Object  Name 
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Required  by  Ail: 


LIBVI5.REL 
r ASLI3. REL 
VISION . BEG 
TTn0.3SG 
NUMI0.3SG 
FILIO.BEG 
MISC.3EG 
CALLI.BEG 
MAIN .BEG 
FLOAT. BEG 
PICTUR.3EG 
STRUCT. BEG 


Appendix  F 


Programs  and  Functions 


Vision  and  Grapr.ics  Runtime  Library 

Picture  Digitizing  Runtime  Library 

Vision  and  Grapr.ics  REQ'd  Files 

Terminal  Input/Output 

Numerical  Input/Output 

File  Input/Output 

Keyword  Processing 

System  Calls 

Prologs  and  Epilogs 

Floating  Point  Conversions 

Picture  Format  Definitions 

Array  Definitions 
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P.2.  Haiti  Line  System 


Program( . lang) 

P3NAP(. BLISS) 

Requires 

SM( .BLISS) 

EDGES (.BLISS) 

EP.BEG 

S.BEG 

FILTER (.BLISS) 

EP.BEG 

REGION (.BLISS) 

AREA. BEG 

PI. BEG 

EP.BEG 

£LIST(. BLISS) 

EP.BEG 

SQ.BEG 

PI. BEG 
STRING. BEG 

STRING (.BLISS) 

SQ.BEG 

PI. BEG 
COMPAR.BEG 
STRING. BEG 

REGSTR(. BLISS) 

ST ATS. BEG 
SQ.BEG 

PI. BEG 
STRING. BEG 
AREA. BEG 

SML3TR(. BLISS) 

SQ.BEG 

PI. BEG 
STRING. BEG 
AREA. BEG 

CORLAB( .BLISS) 

C0RLA8.BEG 
SQ.BEG 
STRING. BEG 
AREA. BEG 
PI. BEG 

ST ATS. BEG 
HC.BEG 
CORLAB.CCL 

GHIST(. BLISS) 

PI. BEG 

SQ.BEG 

HC.BEG 

CL. BEG 
GHIST.CCL 

Description 


Frame  Integrate  3  Images 
Reads  From  Silicon  Vidicon 
Writes  To  a  Picture  File 
Smooth-Eahance 

Reads  and  Writes  Picture  Files 

Detect  Variable  Transition  Edges 

•Multiple  Template  Operator 

Reads  From  a  Picture  File 

Writes  Edge  Triple  File 

Remove  Isolated  Edges 

Reads  and  Writes  Edge  Triple  Files 

Detect  Connected  Regions 

-Multiple  Homogeneity  Operators 

Reads  Edge  Triple  Files 

Writes  Region  Quad  Files 

Writes  Region  Lists  Files 

Make  Edge  Lists 

Reads  Region  Quad  Files 

Writes  Edge  Lists  Files 

Make  V-3  Grapns 
Reads  Edge  Lists  Files 
Writes  V-S  Files 

Make  V-3 -3  Graphs 

Reads  V-S  Files 

Reads  Region  Lists  Files 

Writes  V-S-S  Files 

Writes  Statistical  Summary  Files 

Resegment  V-S-S  Graphs 

Reads  and  Writes  V-S-S  Files 


Convert  V-S-S  Graphs  to  H-C  Graphs 
Correlate  H-C  Grapns  for  Depth 
Correlate  H-C  Graphs  for  Motion 
Label  H-C  Graphs  and  Form  Object 
Graphs  from  Isolated  Subgraphs 
Reads  V-S-S  Graphs 
Reads  Parameter  Files 
Reads  Vertex/Rsgion  Labels  Files 
Writes  H-C  Files 

Label  Objects,  Form  Object  Libraries 
Reads  H-C  Files 
Reads  Object  Library  Files 
Writes  Object  Library  Files 


•  3.  Otr.er  Programs 


17<» 


Program( .lang) 


Description 


CHEESE ( .BLISS) 

FREEZE C .BLISS) 
SM2( .BLISS) 
EXSQ( .BLISS) 
PIX3(. BLISS) 
BPIX3(. BLISS) 
D(  .BLISS) 
£PLOT{. BLISS) 
EEXAMC .BLISS) 
RAMR(. BLISS) 

EELIST ( .BLISS ) 
P3TR ( .3LIS3 ) 
DI5?3(  .BLISi) 
SEEL 13 ( .BLISS) 


Take  252x233x6  Bit  or  SO^xUTSxS  Bit  Image  Sequences 
-Color  or  B/W 

Take  252x233x6  Bit  Motion  Sequence 
Saootn-Ennance  Images 
Expand-Squeeze  Resolution 
Display  uray  Scale/Color  Images 
Display  Binary  Pictures 
Display  Binary  Pictures  On  Plasma  Panel 
Display  Edge  Detection  Plots 
Display  Edge  Triples  and  Region  Quad  Files 
Display  in  Color  Edge  Triples  and  Region 
Quad  Files 

Display  Edge  Lists  Files 

Display  Region  Lists.  V-S,  and  V-S-5  Files 

Display  H-C  Files  (DlSPO.CCL) 

Print  Object  LiDrary  Directory  Information 


MEDG£S( .3LI35)  Edge  Detection  -  Adaptive  Thresnold  Method 

ER(. BLISS)  Edge  Detection  -  Rosenfeld,  Yakiaovsky, 

Linear  Difference 

£W(  .BLISS)  Edge  Detection  -  Hueckel,  Psuedo-Hueckel 


ARITH(. BLISS) 
S3(. BLISS) 


MM( .BLISS) 
SMILEX? .MACRO) 
PRINTX( .MACRO ) 
EPARMC .BLISS) 
ETUNE(.3LISS) 
EDX( .3LISS ) 
SEDX( .BLISS) 
FRAMED ( .BLISS) 
PHIST( .BLISS) 
RDHIST  .BLISS) 
PICN03( .BLISS) 
£M( .BLISS) 
CTOmT. BLISS) 
MT0C( .BLISS) 
MREG3 ( .3LI53 ) 
RSET( .BLISS) 
RCVT ( .3LISS ) 
RVSPACC . BLISS) 
P3PAC£( .BLISS) 
M( .BLISS) 


Picture  Arithmetic 
Merge  Edge  Files  For  Color-Texture 
Suoerposition 

Make  Textured  Regions  Mask 

Expand-Reduce  Image  Resolution/Camera  System 

Expand-Reduce  Image  Resolution/Fiies 

Compute  Edge  Detector  Parameters 

Plot  Edge  Detector  Tuning  Plots 

Plot  Edge  Detector  Tuning  Plots 

Plot  Edge  Detector  Tuning  Plots 

Select  a  Window  from  an  Image 

Plot  Histograms,  Slices  from  Image  Files 

Plot  Image  as  a  3-D  Function 

Estimate  Image  Noise  Statistics 

Edge  Texture  Image  Generator 

V-o-S  Binary  Format  To  ASCII  List  Format 

V-3-S  ASCII  List  Format  to  Binary  Format 

Merge  Region  List  Entries 

Make  Artificial  Images 

Scale  Colors  For  Image  Display 

Edit  V-S-S  Sequences 

Edit  Edge  Triples  Sequences 

Edit  Image  Sequences 
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